Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (618)

Search Parameters:
Keywords = mask-RCNN

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 825 KiB  
Article
Conformal Segmentation in Industrial Surface Defect Detection with Statistical Guarantees
by Cheng Shen and Yuewei Liu
Mathematics 2025, 13(15), 2430; https://doi.org/10.3390/math13152430 - 28 Jul 2025
Viewed by 236
Abstract
Detection of surface defects can significantly elongate mechanical service time and mitigate potential risks during safety management. Traditional defect detection methods predominantly rely on manual inspection, which suffers from low efficiency and high costs. Some machine learning algorithms and artificial intelligence models for [...] Read more.
Detection of surface defects can significantly elongate mechanical service time and mitigate potential risks during safety management. Traditional defect detection methods predominantly rely on manual inspection, which suffers from low efficiency and high costs. Some machine learning algorithms and artificial intelligence models for defect detection, such as Convolutional Neural Networks (CNNs), present outstanding performance, but they are often data-dependent and cannot provide guarantees for new test samples. To this end, we construct a detection model by combining Mask R-CNN, selected for its strong baseline performance in pixel-level segmentation, with Conformal Risk Control. The former evaluates the distribution that discriminates defects from all samples based on probability. The detection model is improved by retraining with calibration data that is assumed to be independent and identically distributed (i.i.d) with the test data. The latter constructs a prediction set on which a given guarantee for detection will be obtained. First, we define a loss function for each calibration sample to quantify detection error rates. Subsequently, we derive a statistically rigorous threshold by optimization of error rates and a given guarantee significance as the risk level. With the threshold, defective pixels with high probability in test images are extracted to construct prediction sets. This methodology ensures that the expected error rate on the test set remains strictly bounded by the predefined risk level. Furthermore, our model shows robust and efficient control over the expected test set error rate when calibration-to-test partitioning ratios vary. Full article
Show Figures

Figure 1

20 pages, 4920 KiB  
Article
Martian Skylight Identification Based on the Deep Learning Model
by Lihong Li, Lingli Mu, Wei Zhang, Weihua Dong and Yuqing He
Remote Sens. 2025, 17(15), 2571; https://doi.org/10.3390/rs17152571 - 24 Jul 2025
Viewed by 276
Abstract
As a type of distinctive pit on Mars, skylights are entrances to subsurface lava caves. They are very important for studying volcanic activity and potential preserved water ice, and are also considered as potential sites for human extraterrestrial bases in the future. Most [...] Read more.
As a type of distinctive pit on Mars, skylights are entrances to subsurface lava caves. They are very important for studying volcanic activity and potential preserved water ice, and are also considered as potential sites for human extraterrestrial bases in the future. Most skylights are manually identified, which has low efficiency and is highly subjective. Although deep learning methods have recently been used to identify skylights, they face challenges of few effective samples and low identification accuracy. In this article, 151 positive samples and 920 negative samples based on the MRO-HiRISE image data was used to create an initial skylight dataset, which contained few positive samples. To augment the initial dataset, StyleGAN2-ADA was selected to synthesize some positive samples and generated an augmented dataset with 896 samples. On the basis of the augmented skylight dataset, we proposed YOLOv9-Skylight for skylight identification by incorporating Inner-EIoU loss and DySample to enhance localization accuracy and feature extracting ability. Compared with YOLOv9, the P, R, and the F1 of YOLOv9-Skylight were improved by about 9.1%, 2.8%, and 5.6%, respectively. Compared with other mainstream models such as YOLOv5, YOLOv10, Faster R-CNN, Mask R-CNN, and DETR, YOLOv9-Skylight achieved the highest accuracy (F1 = 92.5%), which shows a strong performance in skylight identification. Full article
(This article belongs to the Special Issue Remote Sensing and Photogrammetry Applied to Deep Space Exploration)
Show Figures

Figure 1

16 pages, 2721 KiB  
Article
An Adapter and Segmentation Network-Based Approach for Automated Atmospheric Front Detection
by Xinya Ding, Xuan Peng, Yanguang Xue, Liang Zhang, Tianying Wang and Yunpeng Zhang
Appl. Sci. 2025, 15(14), 7855; https://doi.org/10.3390/app15147855 - 14 Jul 2025
Viewed by 159
Abstract
This study presents AD-MRCNN, an advanced deep learning framework for automated atmospheric front detection that addresses two critical limitations in existing methods. First, current approaches directly input raw meteorological data without optimizing feature compatibility, potentially hindering model performance. Second, they typically only provide [...] Read more.
This study presents AD-MRCNN, an advanced deep learning framework for automated atmospheric front detection that addresses two critical limitations in existing methods. First, current approaches directly input raw meteorological data without optimizing feature compatibility, potentially hindering model performance. Second, they typically only provide frontal category information without identifying individual frontal systems. Our solution integrates two key innovations: 1. An intelligent adapter module that performs adaptive feature fusion, automatically weighting and combining multi-source meteorological inputs (including temperature, wind fields, and humidity data) to maximize their synergistic effects while minimizing feature conflicts; the utilized network achieves an average improvement of over 4% across various metrics. 2. An enhanced instance segmentation network based on Mask R-CNN architecture that simultaneously achieves (1) precise frontal type classification (cold/warm/stationary/occluded), (2) accurate spatial localization, and (3) identification of distinct frontal systems. Comprehensive evaluation using ERA5 reanalysis data (2009–2018) demonstrates significant improvements, including an 85.1% F1-score, outperforming traditional methods (TFP: 63.1%) and deep learning approaches (Unet: 83.3%), and a 31% reduction in false alarms compared to semantic segmentation methods. The framework’s modular design allows for potential application to other meteorological feature detection tasks. Future work will focus on incorporating temporal dynamics for frontal evolution prediction. Full article
Show Figures

Figure 1

17 pages, 3331 KiB  
Article
Automated Cattle Head and Ear Pose Estimation Using Deep Learning for Animal Welfare Research
by Sueun Kim
Vet. Sci. 2025, 12(7), 664; https://doi.org/10.3390/vetsci12070664 - 13 Jul 2025
Viewed by 399
Abstract
With the increasing importance of animal welfare, behavioral indicators such as changes in head and ear posture are widely recognized as non-invasive and field-applicable markers for evaluating the emotional state and stress levels of animals. However, traditional visual observation methods are often subjective, [...] Read more.
With the increasing importance of animal welfare, behavioral indicators such as changes in head and ear posture are widely recognized as non-invasive and field-applicable markers for evaluating the emotional state and stress levels of animals. However, traditional visual observation methods are often subjective, as assessments can vary between observers, and are unsuitable for long-term, quantitative monitoring. This study proposes an artificial intelligence (AI)-based system for the detection and pose estimation of cattle heads and ears using deep learning techniques. The system integrates Mask R-CNN for accurate object detection and FSA-Net for robust 3D pose estimation (yaw, pitch, and roll) of cattle heads and left ears. Comprehensive datasets were constructed from images of Japanese Black cattle, collected under natural conditions and annotated for both detection and pose estimation tasks. The proposed framework achieved mean average precision (mAP) values of 0.79 for head detection and 0.71 for left ear detection and mean absolute error (MAE) of approximately 8–9° for pose estimation, demonstrating reliable performance across diverse orientations. This approach enables long-term, quantitative, and objective monitoring of cattle behavior, offering significant advantages over traditional subjective stress assessment methods. The developed system holds promise for practical applications in animal welfare research and real-time farm management. Full article
Show Figures

Figure 1

21 pages, 2471 KiB  
Article
Attention-Based Mask R-CNN Enhancement for Infrared Image Target Segmentation
by Liang Wang and Kan Ren
Symmetry 2025, 17(7), 1099; https://doi.org/10.3390/sym17071099 - 9 Jul 2025
Viewed by 380
Abstract
Image segmentation is an important method in the field of image processing, while infrared (IR) image segmentation is one of the challenges in this field due to the unique characteristics of IR data. Infrared imaging utilizes the infrared radiation emitted by objects to [...] Read more.
Image segmentation is an important method in the field of image processing, while infrared (IR) image segmentation is one of the challenges in this field due to the unique characteristics of IR data. Infrared imaging utilizes the infrared radiation emitted by objects to produce images, which can supplement the performance of visible-light images under adverse lighting conditions to some extent. However, the low spatial resolution and limited texture details in IR images hinder the achievement of high-precision segmentation. To address these issues, an attention mechanism based on symmetrical cross-channel interaction—motivated by symmetry principles in computer vision—was integrated into a Mask Region-Based Convolutional Neural Network (Mask R-CNN) framework. A Bottleneck-enhanced Squeeze-and-Attention (BNSA) module was incorporated into the backbone network, and novel loss functions were designed for both the bounding box (Bbox) regression and mask prediction branches to enhance segmentation performance. Furthermore, a dedicated infrared image dataset was constructed to validate the proposed method. The experimental results demonstrate that the optimized model achieves higher segmentation accuracy and better segmentation performance compared to the original network and other mainstream segmentation models on our dataset, demonstrating how symmetrical design principles can effectively improve complex vision tasks. Full article
(This article belongs to the Special Issue Symmetry and Its Applications in Computer Vision)
Show Figures

Figure 1

21 pages, 33500 KiB  
Article
Location Research and Picking Experiment of an Apple-Picking Robot Based on Improved Mask R-CNN and Binocular Vision
by Tianzhong Fang, Wei Chen and Lu Han
Horticulturae 2025, 11(7), 801; https://doi.org/10.3390/horticulturae11070801 - 6 Jul 2025
Viewed by 437
Abstract
With the advancement of agricultural automation technologies, apple-harvesting robots have gradually become a focus of research. As their “perceptual core,” machine vision systems directly determine picking success rates and operational efficiency. However, existing vision systems still exhibit significant shortcomings in target detection and [...] Read more.
With the advancement of agricultural automation technologies, apple-harvesting robots have gradually become a focus of research. As their “perceptual core,” machine vision systems directly determine picking success rates and operational efficiency. However, existing vision systems still exhibit significant shortcomings in target detection and positioning accuracy in complex orchard environments (e.g., uneven illumination, foliage occlusion, and fruit overlap), which hinders practical applications. This study proposes a visual system for apple-harvesting robots based on improved Mask R-CNN and binocular vision to achieve more precise fruit positioning. The binocular camera (ZED2i) carried by the robot acquires dual-channel apple images. An improved Mask R-CNN is employed to implement instance segmentation of apple targets in binocular images, followed by a template-matching algorithm with parallel epipolar constraints for stereo matching. Four pairs of feature points from corresponding apples in binocular images are selected to calculate disparity and depth. Experimental results demonstrate average coefficients of variation and positioning accuracy of 5.09% and 99.61%, respectively, in binocular positioning. During harvesting operations with a self-designed apple-picking robot, the single-image processing time was 0.36 s, the average single harvesting cycle duration reached 7.7 s, and the comprehensive harvesting success rate achieved 94.3%. This work presents a novel high-precision visual positioning method for apple-harvesting robots. Full article
(This article belongs to the Section Fruit Production Systems)
Show Figures

Figure 1

20 pages, 2627 KiB  
Article
Automated Detection of Center-Pivot Irrigation Systems from Remote Sensing Imagery Using Deep Learning
by Aliasghar Bazrafkan, James Kim, Rob Proulx and Zhulu Lin
Remote Sens. 2025, 17(13), 2276; https://doi.org/10.3390/rs17132276 - 3 Jul 2025
Viewed by 459
Abstract
Effective detection of center-pivot irrigation systems is crucial in understanding agricultural activity and managing groundwater resources for sustainable uses, especially in semi-arid regions such as North Dakota, where irrigation primarily depends on groundwater resources. In this study, we have adopted YOLOv11 to detect [...] Read more.
Effective detection of center-pivot irrigation systems is crucial in understanding agricultural activity and managing groundwater resources for sustainable uses, especially in semi-arid regions such as North Dakota, where irrigation primarily depends on groundwater resources. In this study, we have adopted YOLOv11 to detect the center-pivot irrigation systems using multiple remote sensing datasets, including Landsat 8, Sentinel-2, and NAIP (National Agriculture Imagery Program). We developed an ArcGIS custom tool to facilitate data preparation and large-scale model execution for YOLOv11, which was not included in the ArcGIS Pro deep learning package. YOLOv11 was compared against other popular deep learning model architectures such as U-Net, Faster R-CNN, and Mask R-CNN. YOLOv11, using Landsat 8 panchromatic data, achieved the highest detection accuracy (precision: 0.98; recall: 0.91; and F1-score: 0.94) among all tested datasets and models. Spatial autocorrelation and hotspot analysis revealed systematic prediction errors, suggesting a need to adjust training data regionally. Our research demonstrates the potential of deep learning in combination with GIS-based workflows for large-scale irrigation system analysis, adopting precision agricultural technologies for sustainable water resource management. Full article
(This article belongs to the Special Issue Remote Sensing of Agricultural Water Resources)
Show Figures

Figure 1

22 pages, 8689 KiB  
Article
Transfer Learning-Based Accurate Detection of Shrub Crown Boundaries Using UAS Imagery
by Jiawei Li, Huihui Zhang and David Barnard
Remote Sens. 2025, 17(13), 2275; https://doi.org/10.3390/rs17132275 - 3 Jul 2025
Viewed by 356
Abstract
The accurate delineation of shrub crown boundaries is critical for ecological monitoring, land management, and understanding vegetation dynamics in fragile ecosystems such as semi-arid shrublands. While traditional image processing techniques often struggle with overlapping canopies, deep learning methods, such as convolutional neural networks [...] Read more.
The accurate delineation of shrub crown boundaries is critical for ecological monitoring, land management, and understanding vegetation dynamics in fragile ecosystems such as semi-arid shrublands. While traditional image processing techniques often struggle with overlapping canopies, deep learning methods, such as convolutional neural networks (CNNs), offer promising solutions for precise segmentation. This study employed high-resolution imagery captured by unmanned aircraft systems (UASs) throughout the shrub growing season and explored the effectiveness of transfer learning for both semantic segmentation (Attention U-Net) and instance segmentation (Mask R-CNN). It utilized pre-trained model weights from two previous studies that originally focused on tree crown delineation to improve shrub crown segmentation in non-forested areas. Results showed that transfer learning alone did not achieve satisfactory performance due to differences in object characteristics and environmental conditions. However, fine-tuning the pre-trained models by unfreezing additional layers improved segmentation accuracy by around 30%. Fine-tuned pre-trained models show limited sensitivity to shrubs in the early growing season (April to June) and improved performance when shrub crowns become more spectrally unique in late summer (July to September). These findings highlight the value of combining pre-trained models with targeted fine-tuning to enhance model adaptability in complex remote sensing environments. The proposed framework demonstrates a scalable solution for ecological monitoring in data-scarce regions, supporting informed land management decisions and advancing the use of deep learning for long-term environmental monitoring. Full article
(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)
Show Figures

Figure 1

36 pages, 15335 KiB  
Article
An Application of Deep Learning Models for the Detection of Cocoa Pods at Different Ripening Stages: An Approach with Faster R-CNN and Mask R-CNN
by Juan Felipe Restrepo-Arias, María José Montoya-Castaño, María Fernanda Moreno-De La Espriella and John W. Branch-Bedoya
Computation 2025, 13(7), 159; https://doi.org/10.3390/computation13070159 - 2 Jul 2025
Viewed by 650
Abstract
The accurate classification of cocoa pod ripeness is critical for optimizing harvest timing, improving post-harvest processing, and ensuring consistent quality in chocolate production. Traditional ripeness assessment methods are often subjective, labor-intensive, or destructive, highlighting the need for automated, non-invasive solutions. This study evaluates [...] Read more.
The accurate classification of cocoa pod ripeness is critical for optimizing harvest timing, improving post-harvest processing, and ensuring consistent quality in chocolate production. Traditional ripeness assessment methods are often subjective, labor-intensive, or destructive, highlighting the need for automated, non-invasive solutions. This study evaluates the performance of R-CNN-based deep learning models—Faster R-CNN and Mask R-CNN—for the detection and segmentation of cocoa pods across four ripening stages (0–2 months, 2–4 months, 4–6 months, and >6 months) using the RipSetCocoaCNCH12 dataset, which is publicly accessible, comprising 4116 labeled images collected under real-world field conditions, in the context of precision agriculture. Initial experiments using pretrained weights and standard configurations on a custom COCO-format dataset yielded promising baseline results. Faster R-CNN achieved a mean average precision (mAP) of 64.15%, while Mask R-CNN reached 60.81%, with the highest per-class precision in mature pods (C4) but weaker detection in early stages (C1). To improve model robustness, the dataset was subsequently augmented and balanced, followed by targeted hyperparameter optimization for both architectures. The refined models were then benchmarked against state-of-the-art YOLOv8 networks (YOLOv8x and YOLOv8l-seg). Results showed that YOLOv8x achieved the highest mAP of 86.36%, outperforming YOLOv8l-seg (83.85%), Mask R-CNN (73.20%), and Faster R-CNN (67.75%) in overall detection accuracy. However, the R-CNN models offered valuable instance-level segmentation insights, particularly in complex backgrounds. Furthermore, a qualitative evaluation using confidence heatmaps and error analysis revealed that R-CNN architectures occasionally missed small or partially occluded pods. These findings highlight the complementary strengths of region-based and real-time detectors in precision agriculture and emphasize the need for class-specific enhancements and interpretability tools in real-world deployments. Full article
Show Figures

Figure 1

21 pages, 4394 KiB  
Article
Deep Learning Models for Detection and Severity Assessment of Cercospora Leaf Spot (Cercospora capsici) in Chili Peppers Under Natural Conditions
by Douglas Vieira Leite, Alisson Vasconcelos de Brito, Gregorio Guirada Faccioli and Gustavo Haddad Souza Vieira
Plants 2025, 14(13), 2011; https://doi.org/10.3390/plants14132011 - 1 Jul 2025
Viewed by 383
Abstract
The accurate assessment of plant disease severity is crucial for effective crop management. Deep learning, especially via CNNs, is widely used for image segmentation in plant lesion detection, but accurately assessing disease severity across varied environmental conditions remains challenging. This study evaluates eight [...] Read more.
The accurate assessment of plant disease severity is crucial for effective crop management. Deep learning, especially via CNNs, is widely used for image segmentation in plant lesion detection, but accurately assessing disease severity across varied environmental conditions remains challenging. This study evaluates eight deep learning models for detecting and quantifying Cercospora leaf spot (Cercospora capsici) severity in chili peppers under natural field conditions. A custom dataset of 1645 chili pepper leaf images, collected from a Brazilian plantation and annotated with 6282 lesions, was developed for real-world robustness, reflecting real-world variability in lighting and background. First, an algorithm was developed to process raw images, applying ROI selection and background removal. Then, four YOLOv8 and four Mask R-CNN models were fine-tuned for pixel-level segmentation and severity classification, comparing one-stage and two-stage models to offer practical insights for agricultural applications. In pixel-level segmentation on the test dataset, Mask R-CNN achieved superior precision with a Mean Intersection over Union (MIoU) of 0.860 and F1-score of 0.924 for the mask_rcnn_R101_FPN_3x model, compared to 0.808 and 0.893 for the YOLOv8s-Seg model. However, in severity classification, Mask R-CNN underestimated higher severity levels, with an accuracy of 72.3% for level III, while YOLOv8 attained 91.4%. Additionally, YOLOv8 demonstrated greater efficiency, with an inference time of 27 ms versus 89 ms for Mask R-CNN. While Mask R-CNN excels in segmentation accuracy, YOLOv8 offers a compelling balance of speed and reliable severity classification, making it suitable for real-time plant disease assessment in agricultural applications. Full article
(This article belongs to the Section Plant Protection and Biotic Interactions)
Show Figures

Figure 1

26 pages, 12802 KiB  
Article
Indirect Estimation of Seagrass Frontal Area for Coastal Protection: A Mask R-CNN and Dual-Reference Approach
by Than Van Chau, Somi Jung, Minju Kim and Won-Bae Na
J. Mar. Sci. Eng. 2025, 13(7), 1262; https://doi.org/10.3390/jmse13071262 - 29 Jun 2025
Viewed by 346
Abstract
Seagrass constitutes a vital component of coastal ecosystems, providing a wide array of ecosystem services. The accurate measurement of the seagrass frontal area is crucial for assessing its capacity to inhibit water flow and reduce wave energy; however, few effective indirect methods exist. [...] Read more.
Seagrass constitutes a vital component of coastal ecosystems, providing a wide array of ecosystem services. The accurate measurement of the seagrass frontal area is crucial for assessing its capacity to inhibit water flow and reduce wave energy; however, few effective indirect methods exist. To address this limitation, we developed an indirect method that combines the Mask R-CNN model with a dual-reference approach for detecting seagrass and estimating its frontal area. A laboratory-scale underwater camera experiment generated an experimental dataset, which was partitioned into training, validation, and test sets. Following training, evaluation metrics—including IoU, accuracy, precision, recall, and F1-score—approached their upper limits and remained within acceptable ranges. Validation on real seagrass images confirmed satisfactory performance, albeit with slightly lower metrics than those observed in the experimental dataset. Furthermore, the method estimated seagrass frontal areas with errors below 10% (maximum 7.68% and minimum –0.43%), thereby demonstrating high accuracy by accounting for seagrass bending under flowing water conditions. Additionally, we showed that the indirect measurement significantly influences estimations of the seagrass bending height and wave height reduction capacity, mitigating the overestimation associated with traditional direct methods. Thus, this indirect approach offers a promising, environmentally friendly alternative that overcomes the limitations of conventional techniques. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

29 pages, 3799 KiB  
Article
Forest Three-Dimensional Reconstruction Method Based on High-Resolution Remote Sensing Image Using Tree Crown Segmentation and Individual Tree Parameter Extraction Model
by Guangsen Ma, Gang Yang, Hao Lu and Xue Zhang
Remote Sens. 2025, 17(13), 2179; https://doi.org/10.3390/rs17132179 - 25 Jun 2025
Viewed by 412
Abstract
Efficient and accurate acquisition of tree distribution and three-dimensional geometric information in forest scenes, along with three-dimensional reconstructions of entire forest environments, hold significant application value in precision forestry and forestry digital twins. However, due to complex vegetation structures, fine geometric details, and [...] Read more.
Efficient and accurate acquisition of tree distribution and three-dimensional geometric information in forest scenes, along with three-dimensional reconstructions of entire forest environments, hold significant application value in precision forestry and forestry digital twins. However, due to complex vegetation structures, fine geometric details, and severe occlusions in forest environments, existing methods—whether vision-based or LiDAR-based—still face challenges such as high data acquisition costs, feature extraction difficulties, and limited reconstruction accuracy. This study focuses on reconstructing tree distribution and extracting key individual tree parameters, and it proposes a forest 3D reconstruction framework based on high-resolution remote sensing images. Firstly, an optimized Mask R-CNN model was employed to segment individual tree crowns and extract distribution information. Then, a Tree Parameter and Reconstruction Network (TPRN) was constructed to directly estimate key structural parameters (height, DBH etc.) from crown images and generate tree 3D models. Subsequently, the 3D forest scene could be reconstructed by combining the distribution information and tree 3D models. In addition, to address the data scarcity, a hybrid training strategy integrating virtual and real data was proposed for crown segmentation and individual tree parameter estimation. Experimental results demonstrated that the proposed method could reconstruct an entire forest scene within seconds while accurately preserving tree distribution and individual tree attributes. In two real-world plots, the tree counting accuracy exceeded 90%, with an average tree localization error under 0.2 m. The TPRN achieved parameter extraction accuracies of 92.7% and 96% for tree height, and 95.4% and 94.1% for DBH. Furthermore, the generated individual tree models achieved average Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) scores of 11.24 and 0.53, respectively, validating the quality of the reconstruction. This approach enables fast and effective large-scale forest scene reconstruction using only a single remote sensing image as input, demonstrating significant potential for applications in both dynamic forest resource monitoring and forestry-oriented digital twin systems. Full article
(This article belongs to the Special Issue Digital Modeling for Sustainable Forest Management)
Show Figures

Figure 1

22 pages, 47906 KiB  
Article
Spatial Localization of Broadleaf Species in Mixed Forests in Northern Japan Using UAV Multi-Spectral Imagery and Mask R-CNN Model
by Nyo Me Htun, Toshiaki Owari, Satoshi N. Suzuki, Kenji Fukushi, Yuuta Ishizaki, Manato Fushimi, Yamato Unno, Ryota Konda and Satoshi Kita
Remote Sens. 2025, 17(13), 2111; https://doi.org/10.3390/rs17132111 - 20 Jun 2025
Viewed by 675
Abstract
Precise spatial localization of broadleaf species is crucial for efficient forest management and ecological studies. This study presents an advanced approach for segmenting and classifying broadleaf tree species, including Japanese oak (Quercus crispula), in mixed forests using multi-spectral imagery captured by [...] Read more.
Precise spatial localization of broadleaf species is crucial for efficient forest management and ecological studies. This study presents an advanced approach for segmenting and classifying broadleaf tree species, including Japanese oak (Quercus crispula), in mixed forests using multi-spectral imagery captured by unmanned aerial vehicles (UAVs) and deep learning. High-resolution UAV images, including RGB and NIR bands, were collected from two study sites in Hokkaido, Japan: Sub-compartment 97g in the eastern region and Sub-compartment 68E in the central region. A Mask Region-based Convolutional Neural Network (Mask R-CNN) framework was employed to recognize and classify single tree crowns based on annotated training data. The workflow incorporated UAV-derived imagery and crown annotations, supporting reliable model development and evaluation. Results showed that combining multi-spectral bands (RGB and NIR) with canopy height model (CHM) data significantly improved classification performance at both study sites. In Sub-compartment 97g, the RGB + NIR + CHM achieved a precision of 0.76, recall of 0.74, and F1-score of 0.75, compared to 0.73, 0.74, and 0.73 using RGB alone; 0.68, 0.70, and 0.66 with RGB + NIR; and 0.63, 0.67, and 0.63 with RGB + CHM. Similarly, at Sub-compartment 68E, the RGB + NIR + CHM attained a precision of 0.81, recall of 0.78, and F1-score of 0.80, outperforming RGB alone (0.79, 0.79, 0.78), RGB + NIR (0.75, 0.74, 0.72), and RGB + CHM (0.76, 0.75, 0.74). These consistent improvements across diverse forest conditions highlight the effectiveness of integrating spectral (RGB and NIR) and structural (CHM) data. These findings underscore the value of integrating UAV multi-spectral imagery with deep learning techniques for reliable, large-scale identification of tree species and forest monitoring. Full article
Show Figures

Figure 1

21 pages, 18182 KiB  
Article
AgriLiteNet: Lightweight Multi-Scale Tomato Pest and Disease Detection for Agricultural Robots
by Chenghan Yang, Baidong Zhao, Madina Mansurova, Tianyan Zhou, Qiyuan Liu, Junwei Bao and Dingkun Zheng
Horticulturae 2025, 11(6), 671; https://doi.org/10.3390/horticulturae11060671 - 12 Jun 2025
Viewed by 448
Abstract
Real-time detection of tomato pests and diseases is essential for precision agriculture, as it requires high accuracy, speed, and energy efficiency of edge-computing agricultural robots. This study proposes AgriLiteNet (Lightweight Networks for Agriculture), a lightweight neural network integrating MobileNetV3 for local feature extraction [...] Read more.
Real-time detection of tomato pests and diseases is essential for precision agriculture, as it requires high accuracy, speed, and energy efficiency of edge-computing agricultural robots. This study proposes AgriLiteNet (Lightweight Networks for Agriculture), a lightweight neural network integrating MobileNetV3 for local feature extraction and a streamlined Swin Transformer for global modeling. AgriLiteNet is further enhanced by a lightweight channel–spatial mixed attention module and a feature pyramid network, enabling the detection of nine tomato pests and diseases, including small targets like spider mites, dense targets like bacterial spot, and large targets like late blight. It achieves a mean average precision at an intersection-over-union threshold of 0.5 of 0.98735, which is comparable to Suppression Mask R-CNN (0.98955) and Cas-VSwin Transformer (0.98874), and exceeds the performance of YOLOv5n (0.98249) and GMC-MobileV3 (0.98143). With 2.0 million parameters and 0.608 GFLOPs, AgriLiteNet delivers an inference speed of 35 frames per second and power consumption of 15 watts on NVIDIA Jetson Orin NX, surpassing Suppression Mask R-CNN (8 FPS, 22 W) and Cas-VSwin Transformer (12 FPS, 20 W). The model’s efficiency and compact design make it highly suitable for deployment in agricultural robots, supporting sustainable farming through precise pest and disease management. Full article
Show Figures

Figure 1

16 pages, 7816 KiB  
Article
The Initial Attitude Estimation of an Electromagnetic Projectile in the High-Temperature Flow Field Based on Mask R-CNN and the Multi-Constraints Genetic Algorithm
by Jinlong Chen, Miao Yu, Yongcai Guo and Chao Gao
Sensors 2025, 25(12), 3608; https://doi.org/10.3390/s25123608 - 8 Jun 2025
Viewed by 452
Abstract
During the launching process of electromagnetic projectiles, radiated noise, smoke, and debris will interfere with the line of sight and affect the accuracy of initial attitude estimation. To address this issue, an enhanced method that integrates Mask R-CNN and a multi-constraint genetic algorithm [...] Read more.
During the launching process of electromagnetic projectiles, radiated noise, smoke, and debris will interfere with the line of sight and affect the accuracy of initial attitude estimation. To address this issue, an enhanced method that integrates Mask R-CNN and a multi-constraint genetic algorithm was proposed. First, Mask R-CNN was utilized to perform pixel-level edge segmentation of the original image, followed by the Canny algorithm to extract the edge image. This edge image was then processed using the line segment detector (LSD) algorithm to identify the main structural components, characterized by line segments. An enhanced genetic algorithm was employed to restore the occluded edge image. A fitness function, constructed with Hamming distance (HD) constraints alongside initial parameter constraints defined by centroid displacement, was applied to boost convergence speed and avoid local optimization. The optimized search strategy minimized the HD constraint between the repaired stereo images to obtain accurate attitude output. An electromagnetic simulation device was utilized for the experiment. The proposed method was 13 times faster than the Structural Similarity Index (SSIM) method. In a single launch, the target with 70% occlusion was successfully recovered, achieving average deviations of 0.76°, 0.72°, and 0.44° in pitch, roll, and yaw angles, respectively. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Graphical abstract

Back to TopTop