Topic Editors

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Dr. Wenqi Ren
School of Cyber Science and Technology, Sun Yat-Sen University, Guangzhou 510275, China

Applications in Image Analysis and Pattern Recognition

Abstract submission deadline
closed (31 May 2024)
Manuscript submission deadline
closed (31 August 2024)
Viewed by
282785

Topic Information

Dear Colleagues,

There could be up to ~80% neurons in the human brain related to processing visual information and cognition. Apparently, image analysis and pattern recognition are at the core of artificial intelligence, which aims to design computer programs to achieve or mimic human-like intelligence in perception and inference in the real world. With the rapid development of visual sensors and imaging technologies, image analysis and pattern recognition techniques have been extensively applied in various artificial intelligence-related areas, from industry and agriculture to surveillance and social security, etc.

Despite the significant success in methods for image analysis and pattern recognition in the past decade, their applications in addressing real problems are still unsatisfactory. Such a status also indicates a non-neglectable gap between theoretical progress and real applications in the related areas. The collection of these topics moves toward narrowing such gaps, and so we invite papers on both theorical and applied issues related to image analysis and pattern recognition.

All interested authors are invited to submit their innovative methods on the following (but are not limited to) aspects:

  • Deep learning based methods for image analysis;
  • Deep learning based methods for video analysis;
  • Image fusion methods and applications;
  • Multimedia systems and applications;
  • Image enhancement and restoration methods and their applications;
  • Image analysis and pattern recognition for robotics and unmanned systems;
  • Document image analysis and applications;
  • Structural pattern recognition methods and applications;
  • Biomedical image analysis and applications;
  • Advances in pattern recognition theories.

Prof. Dr. Bin Fan
Dr. Wenqi Ren
Topic Editors

Keywords

  • image analysis
  • pattern recognition
  • structural pattern recognition
  • computer vision
  • multimedia analysis
  • deep learning
  • document image analysis
  • image enhancement
  • image restoration
  • biomedical image analysis
  • robotics
  • unmanned systems
  • image retrieval
  • image understanding
  • feature extraction
  • image segmentation
  • semantic segmentation
  • object detection
  • image classification
  • image acquiring techniques

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.5 5.3 2011 18.4 Days CHF 2400
Sensors
sensors
3.4 7.3 2001 18.6 Days CHF 2600
Journal of Imaging
jimaging
2.7 5.9 2015 18.3 Days CHF 1800
Machine Learning and Knowledge Extraction
make
4.0 6.3 2019 20.8 Days CHF 1800
Optics
optics
1.1 2.2 2020 18.4 Days CHF 1200

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (124 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
23 pages, 5755 KiB  
Article
Iris Recognition System Using Advanced Segmentation Techniques and Fuzzy Clustering Methods for Robotic Control
by Slim Ben Chaabane, Rafika Harrabi and Hassene Seddik
J. Imaging 2024, 10(11), 288; https://doi.org/10.3390/jimaging10110288 - 8 Nov 2024
Viewed by 993
Abstract
The idea of developing a robot controlled by iris movement to assist physically disabled individuals is, indeed, innovative and has the potential to significantly improve their quality of life. This technology can empower individuals with limited mobility and enhance their ability to interact [...] Read more.
The idea of developing a robot controlled by iris movement to assist physically disabled individuals is, indeed, innovative and has the potential to significantly improve their quality of life. This technology can empower individuals with limited mobility and enhance their ability to interact with their environment. Disability of movement has a huge impact on the lives of physically disabled people. Therefore, there is need to develop a robot that can be controlled using iris movement. The main idea of this work revolves around iris recognition from an eye image, specifically identifying the centroid of the iris. The centroid’s position is then utilized to issue commands to control the robot. This innovative approach leverages iris movement as a means of communication and control, offering a potential breakthrough in assisting individuals with physical disabilities. The proposed method aims to improve the precision and effectiveness of iris recognition by incorporating advanced segmentation techniques and fuzzy clustering methods. Fast gradient filters using a fuzzy inference system (FIS) are employed to separate the iris from its surroundings. Then, the bald eagle search (BES) algorithm is employed to locate and isolate the iris region. Subsequently, the fuzzy KNN algorithm is applied for the matching process. This combined methodology aims to improve the overall performance of iris recognition systems by leveraging advanced segmentation, search, and classification techniques. The results of the proposed model are validated using the true success rate (TSR) and compared to those of other existing models. These results highlight the effectiveness of the proposed method for the 400 tested images representing 40 people. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

31 pages, 14397 KiB  
Article
Precision Ice Detection on Power Transmission Lines: A Novel Approach with Multi-Scale Retinex and Advanced Morphological Edge Detection Monitoring
by Nalini Rizkyta Nusantika, Jin Xiao and Xiaoguang Hu
J. Imaging 2024, 10(11), 287; https://doi.org/10.3390/jimaging10110287 - 8 Nov 2024
Viewed by 780
Abstract
Line icings on the power transmission lines are dangerous risks that may lead to situations like structural damage or power outages. The current techniques used for identifying ice have certain drawbacks, particularly when used in complex environments. This paper aims to detect lines [...] Read more.
Line icings on the power transmission lines are dangerous risks that may lead to situations like structural damage or power outages. The current techniques used for identifying ice have certain drawbacks, particularly when used in complex environments. This paper aims to detect lines on the top and bottom in PTLI with low illumination and complex backgrounds. The proposed method integrates multistage image processing techniques, including image enhancement, filtering, thresholding, object isolation, edge detection, and line identification. A binocular camera is used to capture images of PTLI. The effectiveness of the method is evaluated through a series of metrics, including accuracy, sensitivity, specificity, and precision, and compared with existing methods. It is observed that the proposed method significantly outperforms the existing methods of ice detection and thickness measurement. This paper uses average accuracy of detection and isolation of ice formations under various conditions at a percentage of 98.35, sensitivity at 91.63%, specificity at 99.42%, and precision of 96.03%. Furthermore, the accuracy of the ice thickness based on the thickness measurements is shown with a much smaller RMSE of 1.20 mm, MAE of 1.10 mm, and R-squared of 0.95. The proposed scheme for ice detection provides a more accurate and reliable method for monitoring ice formation on power transmission lines. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 6259 KiB  
Article
Spectrogram-Based Arrhythmia Classification Using Three-Channel Deep Learning Model with Feature Fusion
by Alaa Eleyan, Fatih Bayram and Gülden Eleyan
Appl. Sci. 2024, 14(21), 9936; https://doi.org/10.3390/app14219936 - 30 Oct 2024
Cited by 2 | Viewed by 1035
Abstract
This paper introduces a novel deep learning model for ECG signal classification using feature fusion. The proposed methodology transforms the ECG time series into a spectrogram image using a short-time Fourier transform (STFT). This spectrogram is further processed to generate a histogram of [...] Read more.
This paper introduces a novel deep learning model for ECG signal classification using feature fusion. The proposed methodology transforms the ECG time series into a spectrogram image using a short-time Fourier transform (STFT). This spectrogram is further processed to generate a histogram of oriented gradients (HOG) and local binary pattern (LBP) features. Three separate 2D convolutional neural networks (CNNs) then analyze these three image representations in parallel. To enhance performance, the extracted features are concatenated before feeding them into a gated recurrent unit (GRU) model. The proposed approach is extensively evaluated on two ECG datasets (MIT-BIH + BIDMC and MIT-BIH) with three and five classes, respectively. The experimental results demonstrate that the proposed approach achieves superior classification accuracy compared to existing algorithms in the literature. This suggests that the model has the potential to be a valuable tool for accurate ECG signal classification, aiding in the diagnosis and treatment of various cardiovascular disorders. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 4843 KiB  
Article
Long-Range Bird Species Identification Using Directional Microphones and CNNs
by Tiago Garcia, Luís Pina, Magnus Robb, Jorge Maria, Roel May and Ricardo Oliveira
Mach. Learn. Knowl. Extr. 2024, 6(4), 2336-2354; https://doi.org/10.3390/make6040115 - 16 Oct 2024
Viewed by 1127
Abstract
This study explores the integration of directional microphones with convolutional neural networks (CNNs) for long-range bird species identification. By employing directional microphones, we aimed to capture high-resolution audio from specific directions, potentially improving the clarity of bird calls over extended distances. Our approach [...] Read more.
This study explores the integration of directional microphones with convolutional neural networks (CNNs) for long-range bird species identification. By employing directional microphones, we aimed to capture high-resolution audio from specific directions, potentially improving the clarity of bird calls over extended distances. Our approach involved processing these recordings with CNNs trained on a diverse dataset of bird calls. The results demonstrated that the system is capable of systematically identifying bird species up to 150 m, reaching 280 m for species vocalizing at frequencies greater than 1000 Hz and clearly distinct from background noise. The furthest successful detection was obtained at 510 m. While the method showed promise in enhancing the identification process compared to traditional techniques, there were notable limitations in the clarity of the audio recordings. These findings suggest that while the integration of directional microphones and CNNs for long-range bird species identification is promising, further refinement is needed to fully realize the benefits of this approach. Future efforts should focus on improving the audio-capture technology to reduce ambient noise and enhance the system’s overall performance in long-range bird species identification. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 5759 KiB  
Article
Fully Automatic Grayscale Image Segmentation: Dynamic Thresholding for Background Adaptation, Improved Image Center Point Selection, and Noise-Resilient Start/End Point Determination
by Junyan Li and Xuewen Gui
Appl. Sci. 2024, 14(20), 9303; https://doi.org/10.3390/app14209303 - 12 Oct 2024
Cited by 1 | Viewed by 1414
Abstract
As the requirement for image uploads in various systems continues to grow, image segmentation has become a critical task for subsequent operations. Balancing the efficiency and accuracy of image segmentation is a persistent challenge. This paper focuses on threshold-based grayscale image segmentation methods [...] Read more.
As the requirement for image uploads in various systems continues to grow, image segmentation has become a critical task for subsequent operations. Balancing the efficiency and accuracy of image segmentation is a persistent challenge. This paper focuses on threshold-based grayscale image segmentation methods and proposes a fully automated approach. The approach begins with the implementation of an improved OTSU algorithm to determine the optimal dynamic threshold, enabling the segmentation process to adjust adaptively to varying image backgrounds. A novel method for selecting image center points is introduced to address the issue of poor segmentation when the center point falls outside the segmentation foreground area. To further enhance the algorithm’s generalization capability and accuracy, a continuity detection-based method is developed to determine the start and end points of the segmentation foreground. Compared with traditional algorithms, tests on sample images of four different scales revealed that the proposed algorithm achieved average improvements in accuracy, precision, and recall rates of 14.97%, 1.28%, and 17.33%, respectively, with processing speed remaining largely unaffected. Ablation experiments further validated the effectiveness of using different strategy combinations, with the combination of all three strategies resulting in significant improvements in accuracy and recall rates by 15.51% and 16.72%, respectively. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 4840 KiB  
Article
High-Quality Image Compression Algorithm Design Based on Unsupervised Learning
by Shuo Han, Bo Mo, Jie Zhao, Junwei Xu, Shizun Sun and Bo Jin
Sensors 2024, 24(20), 6503; https://doi.org/10.3390/s24206503 - 10 Oct 2024
Viewed by 1198
Abstract
Increasingly massive image data is restricted by conditions such as information transmission and reconstruction, and it is increasingly difficult to meet the requirements of speed and integrity in the information age. To solve the urgent problems faced by massive image data in information [...] Read more.
Increasingly massive image data is restricted by conditions such as information transmission and reconstruction, and it is increasingly difficult to meet the requirements of speed and integrity in the information age. To solve the urgent problems faced by massive image data in information transmission, this paper proposes a high-quality image compression algorithm based on unsupervised learning. Among them, a content-weighted autoencoder network is proposed to achieve image compression coding on the basis of a smaller bit rate to solve the entropy rate optimization problem. Binary quantizers are used for coding quantization, and importance maps are used to achieve better bit allocation. The compression rate is further controlled and optimized. A multi-scale discriminator suitable for the generative adversarial network image compression framework is designed to solve the problem that the generated compressed image is prone to blurring and distortion. Finally, through training with different weights, the distortion of each scale is minimized, so that the image compression can achieve a higher quality compression and reconstruction effect. The experimental results show that the algorithm model can save the details of the image and greatly compress the memory of the image. Its advantage is that it can expand and compress a large number of images quickly and efficiently and realize the efficient processing of image compression. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 35678 KiB  
Article
Shrimp Larvae Counting Based on Improved YOLOv5 Model with Regional Segmentation
by Hongchao Duan, Jun Wang, Yuan Zhang, Xiangyu Wu, Tao Peng, Xuhao Liu and Delong Deng
Sensors 2024, 24(19), 6328; https://doi.org/10.3390/s24196328 - 30 Sep 2024
Viewed by 1078
Abstract
Counting shrimp larvae is an essential part of shrimp farming. Due to their tiny size and high density, this task is exceedingly difficult. Thus, we introduce an algorithm for counting densely packed shrimp larvae utilizing an enhanced You Only Look Once version 5 [...] Read more.
Counting shrimp larvae is an essential part of shrimp farming. Due to their tiny size and high density, this task is exceedingly difficult. Thus, we introduce an algorithm for counting densely packed shrimp larvae utilizing an enhanced You Only Look Once version 5 (YOLOv5) model through a regional segmentation approach. First, the C2f and convolutional block attention modules are used to improve the capabilities of YOLOv5 in recognizing the small shrimp. Moreover, employing a regional segmentation technique can decrease the receptive field area, thereby enhancing the shrimp counter’s detection performance. Finally, a strategy for stitching and deduplication is implemented to tackle the problem of double counting across various segments. The findings from the experiments indicate that the suggested algorithm surpasses several other shrimp counting techniques in terms of accuracy. Notably, for high-density shrimp larvae in large quantities, this algorithm attained an accuracy exceeding 98%. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 6582 KiB  
Article
Multi-Temporal Snow-Covered Remote Sensing Image Matching via Image Transformation and Multi-Level Feature Extraction
by Zhitao Fu, Jian Zhang and Bo-Hui Tang
Optics 2024, 5(4), 392-405; https://doi.org/10.3390/opt5040029 - 29 Sep 2024
Viewed by 1060
Abstract
To address the challenge of image matching posed by significant modal differences in remote sensing images influenced by snow cover, this paper proposes an innovative image transformation-based matching method. Initially, the Pix2Pix-GAN conversion network is employed to transform remote sensing images with snow [...] Read more.
To address the challenge of image matching posed by significant modal differences in remote sensing images influenced by snow cover, this paper proposes an innovative image transformation-based matching method. Initially, the Pix2Pix-GAN conversion network is employed to transform remote sensing images with snow cover into images without snow cover, reducing the feature disparity between the images. This conversion facilitates the extraction of more discernible features for matching by transforming the problem from snow-covered to snow-free images. Subsequently, a multi-level feature extraction network is utilized to extract multi-level feature descriptors from the transformed images. Keypoints are derived from these descriptors, enabling effective feature matching. Finally, the matching results are mapped back onto the original snow-covered remote sensing images. The proposed method was compared to well-established techniques such as SIFT, RIFT2, R2D2, and ReDFeat and demonstrated outstanding performance. In terms of NCM, MP, Rep, Recall, and F1-measure, our method outperformed the state of the art by 177, 0.29, 0.22, 0.21, and 0.25, respectively. In addition, the algorithm shows robustness over a range of image rotation angles from −40° to 40°. This innovative approach offers a new perspective on the task of matching multi-temporal snow-covered remote sensing images. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 7240 KiB  
Article
YOLO-BFRV: An Efficient Model for Detecting Printed Circuit Board Defects
by Jiaxin Liu, Bingyu Kang, Chao Liu, Xunhui Peng and Yan Bai
Sensors 2024, 24(18), 6055; https://doi.org/10.3390/s24186055 - 19 Sep 2024
Viewed by 1480
Abstract
The small area of a printed circuit board (PCB) results in densely distributed defects, leading to a lower detection accuracy, which subsequently impacts the safety and stability of the circuit board. This paper proposes a new YOLO-BFRV network model based on the improved [...] Read more.
The small area of a printed circuit board (PCB) results in densely distributed defects, leading to a lower detection accuracy, which subsequently impacts the safety and stability of the circuit board. This paper proposes a new YOLO-BFRV network model based on the improved YOLOv8 framework to identify PCB defects more efficiently and accurately. First, a bidirectional feature pyramid network (BIFPN) is introduced to expand the receptive field of each feature level and enrich the semantic information to improve the feature extraction capability. Second, the YOLOv8 backbone network is refined into a lightweight FasterNet network, reducing the computational load while improving the detection accuracy of minor defects. Subsequently, the high-speed re-parameterized detection head (RepHead) reduces inference complexity and boosts the detection speed without compromising accuracy. Finally, the VarifocalLoss is employed to enhance the detection accuracy for densely distributed PCB defects. The experimental results demonstrate that the improved model increases the mAP by 4.12% compared to the benchmark YOLOv8s model, boosts the detection speed by 45.89%, and reduces the GFLOPs by 82.53%, further confirming the superiority of the algorithm presented in this paper. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 5499 KiB  
Article
Correlating Histopathological Microscopic Images of Creutzfeldt–Jakob Disease with Clinical Typology Using Graph Theory and Artificial Intelligence
by Carlos Martínez, Susana Teijeira, Patricia Domínguez, Silvia Campanioni, Laura Busto, José A. González-Nóvoa, Jacobo Alonso, Eva Poveda, Beatriz San Millán and César Veiga
Mach. Learn. Knowl. Extr. 2024, 6(3), 2018-2032; https://doi.org/10.3390/make6030099 - 7 Sep 2024
Viewed by 1106
Abstract
Creutzfeldt–Jakob disease (CJD) is a rare, degenerative, and fatal brain disorder caused by abnormal proteins called prions. This research introduces a novel approach combining AI and graph theory to analyze histopathological microscopic images of brain tissues affected by CJD. The detection and quantification [...] Read more.
Creutzfeldt–Jakob disease (CJD) is a rare, degenerative, and fatal brain disorder caused by abnormal proteins called prions. This research introduces a novel approach combining AI and graph theory to analyze histopathological microscopic images of brain tissues affected by CJD. The detection and quantification of spongiosis, characterized by the presence of vacuoles in the brain tissue, plays a crucial role in aiding the accurate diagnosis of CJD. The proposed methodology employs image processing techniques to identify these pathological features in high-resolution medical images. By developing an automatic pipeline for the detection of spongiosis, we aim to overcome some limitations of manual feature extraction. The results demonstrate that our method correctly identifies and characterize spongiosis and allows the extraction of features that will help to better understand the spongiosis patterns in different CJD patients. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 3025 KiB  
Article
A Deep Learning Framework for Real-Time Bird Detection and Its Implications for Reducing Bird Strike Incidents
by Najiba Said Hamed Alzadjali, Sundaravadivazhagan Balasubaramainan, Charles Savarimuthu and Emanuel O. Rances
Sensors 2024, 24(17), 5455; https://doi.org/10.3390/s24175455 - 23 Aug 2024
Cited by 1 | Viewed by 2488
Abstract
Bird strikes are a substantial aviation safety issue that can result in serious harm to aircraft components and even passenger deaths. In response to this increased tendency, the implementation of new and more efficient detection and prevention technologies becomes urgent. The paper presents [...] Read more.
Bird strikes are a substantial aviation safety issue that can result in serious harm to aircraft components and even passenger deaths. In response to this increased tendency, the implementation of new and more efficient detection and prevention technologies becomes urgent. The paper presents a novel deep learning model which is developed to detect and alleviate bird strike issues in airport conditions boosting aircraft safety. Based on an extensive database of bird images having different species and flight patterns, the research adopts sophisticated image augmentation techniques which generate multiple scenarios of aircraft operation ensuring that the model is robust under different conditions. The methodology evolved around the building of a spatiotemporal convolutional neural network which employs spatial attention structures together with dynamic temporal processing to precisely recognize flying birds. One of the most important features of this research is the architecture of its dual-focus model which consists of two components, the attention-based temporal analysis network and the convolutional neural network with spatial awareness. The model’s architecture can identify specific features nested in a crowded and shifting backdrop, thereby lowering false positives and improving detection accuracy. The mechanisms of attention of this model itself enhance the model’s focus by identifying vital features of bird flight patterns that are crucial. The results are that the proposed model achieves better performance in terms of accuracy and real time responses than the existing bird detection systems. The ablation study demonstrates the indispensable roles of each component, confirming their synergistic effect on improving detection performance. The research substantiates the model’s applicability as a part of airport bird strike surveillance system, providing an alternative to the prevention strategy. This work benefits from the unique deep learning feature application, which leads to a large-scale and reliable tool for dealing with the bird strike problem. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

26 pages, 40263 KiB  
Article
A Fast and High-Accuracy Foreign Object Detection Method for Belt Conveyor Coal Flow Images with Target Occlusion
by Hongwei Fan, Jinpeng Liu, Xinshan Yan, Chao Zhang, Xiangang Cao and Qinghua Mao
Sensors 2024, 24(16), 5251; https://doi.org/10.3390/s24165251 - 14 Aug 2024
Viewed by 1201
Abstract
Foreign objects in coal flow easily cause damage to conveyor belts, and most foreign objects are often occluded, making them difficult to detect. Aiming at solving the problems of low accuracy and efficiency in the detection of occluded targets in a low-illumination and [...] Read more.
Foreign objects in coal flow easily cause damage to conveyor belts, and most foreign objects are often occluded, making them difficult to detect. Aiming at solving the problems of low accuracy and efficiency in the detection of occluded targets in a low-illumination and dust fog environment, an image detection method for foreign objects is proposed. Firstly, YOLOv5s back-end processing is optimized by soft non-maximum suppression to reduce the influence of dense objects. Secondly, SimOTA label allocation is used to reduce the influence of ambiguous samples under dense occlusion. Then, Slide Loss is used to excavate difficult samples, and Inner–SIoU is used to optimize the bounding box regression loss. Finally, Group–Taylor pruning is used to compress the model. The experimental results show that the proposed method has only 4.20 × 105 parameters, a computational amount of 1.00 × 109, a model size of 1.20 MB, and an mAP0.5 of up to 91.30% on the self-built dataset. The detection speed on the different computing devices is as high as 66.31, 41.90, and 33.03 FPS. This proves that the proposed method achieves fast and high-accuracy detection of multi-layer occluded coal flow foreign objects. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 9444 KiB  
Article
Specifics of Data Collection and Data Processing during Formation of RailVista Dataset for Machine Learning- and Deep Learning-Based Applications
by Gulsipat Abisheva, Nikolaj Goranin, Bibigul Razakhova, Tolegen Aidynov and Dina Satybaldina
Sensors 2024, 24(16), 5239; https://doi.org/10.3390/s24165239 - 13 Aug 2024
Viewed by 1120
Abstract
This paper presents the methodology and outcomes of creating the Rail Vista dataset, designed for detecting defects on railway tracks using machine and deep learning techniques. The dataset comprises 200,000 high-resolution images categorized into 19 distinct classes covering various railway infrastructure defects. The [...] Read more.
This paper presents the methodology and outcomes of creating the Rail Vista dataset, designed for detecting defects on railway tracks using machine and deep learning techniques. The dataset comprises 200,000 high-resolution images categorized into 19 distinct classes covering various railway infrastructure defects. The data collection involved a meticulous process including complex image capture methods, distortion techniques for data enrichment, and secure storage in a data warehouse using efficient binary file formats. This structured dataset facilitates effective training of machine/deep learning models, enhancing automated defect detection systems in railway safety and maintenance applications. The study underscores the critical role of high-quality datasets in advancing machine learning applications within the railway domain, highlighting future prospects for improving safety and reliability through automated recognition technologies. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

27 pages, 13847 KiB  
Article
RailTrack-DaViT: A Vision Transformer-Based Approach for Automated Railway Track Defect Detection
by Aniwat Phaphuangwittayakul, Napat Harnpornchai, Fangli Ying and Jinming Zhang
J. Imaging 2024, 10(8), 192; https://doi.org/10.3390/jimaging10080192 - 7 Aug 2024
Cited by 1 | Viewed by 2013
Abstract
Railway track defects pose significant safety risks and can lead to accidents, economic losses, and loss of life. Traditional manual inspection methods are either time-consuming, costly, or prone to human error. This paper proposes RailTrack-DaViT, a novel vision transformer-based approach for railway track [...] Read more.
Railway track defects pose significant safety risks and can lead to accidents, economic losses, and loss of life. Traditional manual inspection methods are either time-consuming, costly, or prone to human error. This paper proposes RailTrack-DaViT, a novel vision transformer-based approach for railway track defect classification. By leveraging the Dual Attention Vision Transformer (DaViT) architecture, RailTrack-DaViT effectively captures both global and local information, enabling accurate defect detection. The model is trained and evaluated on multiple datasets including rail, fastener and fishplate, multi-faults, and ThaiRailTrack. A comprehensive analysis of the model’s performance is provided including confusion matrices, training visualizations, and classification metrics. RailTrack-DaViT demonstrates superior performance compared to state-of-the-art CNN-based methods, achieving the highest accuracies: 96.9% on the rail dataset, 98.9% on the fastener and fishplate dataset, and 98.8% on the multi-faults dataset. Moreover, RailTrack-DaViT outperforms baselines on the ThaiRailTrack dataset with 99.2% accuracy, quickly adapts to unseen images, and shows better model stability during fine-tuning. This capability can significantly reduce time consumption when applying the model to novel datasets in practical applications. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

22 pages, 18174 KiB  
Article
Research on Pupil Center Localization Detection Algorithm with Improved YOLOv8
by Kejuan Xue, Jinsong Wang and Hao Wang
Appl. Sci. 2024, 14(15), 6661; https://doi.org/10.3390/app14156661 - 30 Jul 2024
Viewed by 1220
Abstract
Addressing issues such as low localization accuracy, poor robustness, and long average localization time in pupil center localization algorithms, an improved YOLOv8 network-based pupil center localization algorithm is proposed. This algorithm incorporates a dual attention mechanism into the YOLOv8n backbone network, which simultaneously [...] Read more.
Addressing issues such as low localization accuracy, poor robustness, and long average localization time in pupil center localization algorithms, an improved YOLOv8 network-based pupil center localization algorithm is proposed. This algorithm incorporates a dual attention mechanism into the YOLOv8n backbone network, which simultaneously attends to global contextual information of input data while reducing dependence on specific regions. This improves the problem of difficult pupil localization detection due to occlusions such as eyelashes and eyelids, enhancing the model’s robustness. Additionally, atrous convolutions are introduced in the encoding section, which reduce the network model while improving the model’s detection speed. The use of the Focaler-IoU loss function, by focusing on different regression samples, can improve the performance of detectors in various detection tasks. The performance of the improved Yolov8n algorithm was 0.99971, 1, 0.99611, and 0.96495 in precision, recall, MAP50, and mAP50-95, respectively. Moreover, the improved YOLOv8n algorithm reduced the model parameters by 7.18% and the computational complexity by 10.06%, while enhancing the environmental anti-interference ability and robustness, and shortening the localization time, improving real-time detection. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

20 pages, 31926 KiB  
Article
Deep Learning and Histogram-Based Grain Size Analysis of Images
by Wei Wei, Xiaohong Xu, Guangming Hu, Yanlin Shao and Qing Wang
Sensors 2024, 24(15), 4923; https://doi.org/10.3390/s24154923 - 30 Jul 2024
Viewed by 1354
Abstract
Grain size analysis is used to study grain size and distribution. It is a critical indicator in sedimentary simulation experiments (SSEs), which aids in understanding hydrodynamic conditions and identifying the features of sedimentary environments. Existing methods for grain size analysis based on images [...] Read more.
Grain size analysis is used to study grain size and distribution. It is a critical indicator in sedimentary simulation experiments (SSEs), which aids in understanding hydrodynamic conditions and identifying the features of sedimentary environments. Existing methods for grain size analysis based on images primarily focus on scenarios where grain edges are distinct or grain arrangements are regular. However, these methods are not suitable for images from SSEs. We proposed a deep learning model incorporating histogram layers for the analysis of SSE images with fuzzy grain edges and irregular arrangements. Firstly, ResNet18 was used to extract features from SSE images. These features were then input into the histogram layer to obtain local histogram features, which were concatenated to form comprehensive histogram features for the entire image. Finally, the histogram features were connected to a fully connected layer to estimate the grain size corresponding to the cumulative volume percentage. In addition, an applied workflow was developed. The results demonstrate that the proposed method achieved higher accuracy than the eight other models and was highly consistent with manual results in practice. The proposed method enhances the efficiency and accuracy of grain size analysis for images with irregular grain distribution and improves the quantification and automation of grain size analysis in SSEs. It can also be applied for grain size analysis in fields such as soil and geotechnical engineering. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

24 pages, 8432 KiB  
Article
Lane Attribute Classification Based on Fine-Grained Description
by Zhonghe He, Pengfei Gong, Hongcheng Ye and Zizheng Gan
Sensors 2024, 24(15), 4800; https://doi.org/10.3390/s24154800 - 24 Jul 2024
Cited by 3 | Viewed by 879
Abstract
As an indispensable part of the vehicle environment perception task, road traffic marking detection plays a vital role in correctly understanding the current traffic situation. However, the existing traffic marking detection algorithms still have some limitations. Taking lane detection as an example, the [...] Read more.
As an indispensable part of the vehicle environment perception task, road traffic marking detection plays a vital role in correctly understanding the current traffic situation. However, the existing traffic marking detection algorithms still have some limitations. Taking lane detection as an example, the current detection methods mainly focus on the location information detection of lane lines, and they only judge the overall attribute of each detected lane line instance, thus lacking more fine-grained dynamic detection of lane line attributes. In order to meet the needs of intelligent vehicles for the dynamic attribute detection of lane lines and more perfect road environment information in urban road environment, this paper constructs a fine-grained attribute detection method for lane lines, which uses pixel-level attribute sequence points to describe the complete attribute distribution of lane lines and then matches the detection results of the lane lines. Realizing the attribute judgment of different segment positions of lane instances is called the fine-grained attribute detection of lane lines (Lane-FGA). In addition, in view of the lack of annotation information in the current open-source lane data set, this paper constructs a lane data set with both lane instance information and fine-grained attribute information by combining manual annotation and intelligent annotation. At the same time, a cyclic iterative attribute inference algorithm is designed to solve the difficult problem of lane attribute labeling in areas without visual cues such as occlusion and damage. In the end, the average accuracy of the proposed algorithm reaches 97% on various types of lane attribute detection. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

22 pages, 9246 KiB  
Article
DST-DETR: Image Dehazing RT-DETR for Safety Helmet Detection in Foggy Weather
by Ziyuan Liu, Chunxia Sun and Xiaopeng Wang
Sensors 2024, 24(14), 4628; https://doi.org/10.3390/s24144628 - 17 Jul 2024
Cited by 1 | Viewed by 1500
Abstract
In foggy weather, outdoor safety helmet detection often suffers from low visibility and unclear objects, hindering optimal detector performance. Moreover, safety helmets typically appear as small objects at construction sites, prone to occlusion and difficult to distinguish from complex backgrounds, further exacerbating the [...] Read more.
In foggy weather, outdoor safety helmet detection often suffers from low visibility and unclear objects, hindering optimal detector performance. Moreover, safety helmets typically appear as small objects at construction sites, prone to occlusion and difficult to distinguish from complex backgrounds, further exacerbating the detection challenge. Therefore, the real-time and precise detection of safety helmet usage among construction personnel, particularly in adverse weather conditions such as foggy weather, poses a significant challenge. To address this issue, this paper proposes the DST-DETR, a framework for foggy weather safety helmet detection. The DST-DETR framework comprises a dehazing module, PAOD-Net, and an object detection module, ST-DETR, for joint dehazing and detection. Initially, foggy images are restored within PAOD-Net, enhancing the AOD-Net model by introducing a novel convolutional module, PfConv, guided by the parameter-free average attention module (PfAAM). This module enables more focused attention on crucial features in lightweight models, therefore enhancing performance. Subsequently, the MS-SSIM + 2 loss function is employed to bolster the model’s robustness, making it adaptable to scenes with intricate backgrounds and variable fog densities. Next, within the object detection module, the ST-DETR model is designed to address small objects. By refining the RT-DETR model, its capability to detect small objects in low-quality images is enhanced. The core of this approach lies in utilizing the variant ResNet-18 as the backbone to make the network lightweight without sacrificing accuracy, followed by effectively integrating the small-object layer into the improved BiFPN neck structure, resulting in CCFF-BiFPN-P2. Various experiments were conducted to qualitatively and quantitatively compare our method with several state-of-the-art approaches, demonstrating its superiority. The results validate that the DST-DETR algorithm is better suited for foggy safety helmet detection tasks in construction scenarios. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 4141 KiB  
Article
High-Performance Binocular Disparity Prediction Algorithm for Edge Computing
by Yuxi Cheng, Yang Song, Yi Liu, Hui Zhang and Feng Liu
Sensors 2024, 24(14), 4563; https://doi.org/10.3390/s24144563 - 14 Jul 2024
Viewed by 1128
Abstract
End-to-end disparity estimation algorithms based on cost volume deployed in edge-end neural network accelerators have the problem of structural adaptation and need to ensure accuracy under the condition of adaptation operator. Therefore, this paper proposes a novel disparity calculation algorithm that uses low-rank [...] Read more.
End-to-end disparity estimation algorithms based on cost volume deployed in edge-end neural network accelerators have the problem of structural adaptation and need to ensure accuracy under the condition of adaptation operator. Therefore, this paper proposes a novel disparity calculation algorithm that uses low-rank approximation to approximately replace 3D convolution and transposed 3D convolution, WReLU to reduce data compression caused by the activation function, and unimodal cost volume filtering and a confidence estimation network to regularize cost volume. It alleviates the problem of disparity-matching cost distribution being far away from the true distribution and greatly reduces the computational complexity and number of parameters of the algorithm while improving accuracy. Experimental results show that compared with a typical disparity estimation network, the absolute error of the proposed algorithm is reduced by 38.3%, the three-pixel error is reduced to 1.41%, and the number of parameters is reduced by 67.3%. The calculation accuracy is better than that of other algorithms, it is easier to deploy, and it has strong structural adaptability and better practicability. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

30 pages, 8210 KiB  
Article
Multi-Source Feature-Fusion Method for the Seismic Data of Cultural Relics Based on Deep Learning
by Lin He, Quan Wei, Mengting Gong, Xiaofei Yang and Jianming Wei
Sensors 2024, 24(14), 4525; https://doi.org/10.3390/s24144525 - 12 Jul 2024
Viewed by 856
Abstract
The museum system is exposed to a high risk of seismic hazards. However, it is difficult to carry out seismic hazard prevention to protect cultural relics in collections due to the lack of real data and diverse types of seismic hazards. To address [...] Read more.
The museum system is exposed to a high risk of seismic hazards. However, it is difficult to carry out seismic hazard prevention to protect cultural relics in collections due to the lack of real data and diverse types of seismic hazards. To address this problem, we developed a deep-learning-based multi-source feature-fusion method to assess the data on seismic damage caused by collected cultural relics. Firstly, a multi-source data-processing strategy was developed according to the needs of seismic impact analysis of the cultural relics in the collection, and a seismic event-ontology model of cultural relics was constructed. Additionally, a seismic damage data-classification acquisition method and empirical calculation model were designed. Secondly, we proposed a deep learning-based multi-source feature-fusion matching method for cultural relics. By constructing a damage state assessment model of cultural relics using superpixel map convolutional fusion and an automatic data-matching model, the quality and processing efficiency of seismic damage data of the cultural relics in the collection were improved. Finally, we formed a dataset oriented to the seismic damage risk analysis of the cultural relics in the collection. The experimental results show that the accuracy of this method reaches 93.6%, and the accuracy of cultural relics label matching is as high as 82.6% compared with many kinds of earthquake damage state assessment models. This method can provide more accurate and efficient data support, along with a scientific basis for subsequent research on the impact analysis of seismic damage to cultural relics in collections. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Graphical abstract

18 pages, 5365 KiB  
Article
A Novel Approach for Meat Quality Assessment Using an Ensemble of Compact Convolutional Neural Networks
by Poonguzhali Elangovan, Vijayalakshmi Dhurairajan, Malaya Kumar Nath, Pratheepan Yogarajah and Joan Condell
Appl. Sci. 2024, 14(14), 5979; https://doi.org/10.3390/app14145979 - 9 Jul 2024
Viewed by 1382
Abstract
The rising awareness of nutritional values has led to an increase in the popularity of meat-based diets. Hence, to ensure public health, industries and consumers are focusing more on the quality and freshness of this food. Authentic meat quality assessment methods can indeed [...] Read more.
The rising awareness of nutritional values has led to an increase in the popularity of meat-based diets. Hence, to ensure public health, industries and consumers are focusing more on the quality and freshness of this food. Authentic meat quality assessment methods can indeed be exorbitant and catastrophic. Furthermore, it is subjective and reliant on the knowledge of specialists. Fully automated computer-aided diagnosis systems are required to eradicate the variability among experts. However, evaluating the quality of meat automatically is challenging. Deep convolutional neural networks have shown a tremendous improvement in meat quality assessment. This research intends to utilize an ensemble framework of shallow convolutional neural networks for assessing the quality and freshness of the meat. Two compact CNN architectures (ConvNet-18 and ConvNet-24) are developed, and the efficacy of the models are evaluated using two publicly available databases. Experimental findings reveal that the ConvNet-18 outperforms other state-of-the models in classifying fresh and spoiled meat with an overall accuracy of 99.4%, whereas ConvNet-24 shows a better outcome in categorizing the meat based on its freshness. This model yields an accuracy of 96.6%, which is much better compared with standard models. Furthermore, the suggested models effectively detect the quality and freshness of the meat with less complexity than the existing state-of-the art techniques. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3137 KiB  
Article
Varroa Mite Counting Based on Hyperspectral Imaging
by Amira Ghezal, Christian Jair Luis Peña and Andreas König
Sensors 2024, 24(14), 4437; https://doi.org/10.3390/s24144437 - 9 Jul 2024
Cited by 1 | Viewed by 1095
Abstract
Varroa mite infestation poses a severe threat to honeybee colonies globally. This study investigates the feasibility of utilizing the HS-Cam and machine learning techniques for Varroa mite counting. The methodology involves image acquisition, dimensionality reduction through Principal Component Analysis (PCA), and machine learning-based [...] Read more.
Varroa mite infestation poses a severe threat to honeybee colonies globally. This study investigates the feasibility of utilizing the HS-Cam and machine learning techniques for Varroa mite counting. The methodology involves image acquisition, dimensionality reduction through Principal Component Analysis (PCA), and machine learning-based segmentation and classification algorithms. Specifically, a k-Nearest Neighbors (kNNs) model distinguishes Varroa mites from other objects in the images, while a Support Vector Machine (SVM) classifier enhances shape detection. The final phase integrates a dedicated counting algorithm, leveraging outputs from the SVM classifier to quantify Varroa mite populations in hyperspectral images. The preliminary results demonstrate segmentation accuracy exceeding 99% and an average precision of 0.9983 and recall of 0.9947 across all the classes. The results obtained from our machine learning-based approach for Varroa mite counting were compared against ground-truth labels obtained through manual counting, demonstrating a high degree of agreement between the automated counting and manual ground truth. Despite working with a limited dataset, the HS-Cam showcases its potential for Varroa counting, delivering superior performance compared to traditional RGB images. Future research directions include validating the proposed hyperspectral imaging methodology with a more extensive and diverse dataset. Additionally, the effectiveness of using a near-infrared (NIR) excitation source for Varroa detection will be explored, along with assessing smartphone integration feasibility. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 8243 KiB  
Article
Evaluation Metrics for Generative Models: An Empirical Study
by Eyal Betzalel, Coby Penso and Ethan Fetaya
Mach. Learn. Knowl. Extr. 2024, 6(3), 1531-1544; https://doi.org/10.3390/make6030073 - 7 Jul 2024
Viewed by 2803
Abstract
Generative models such as generative adversarial networks, diffusion models, and variational auto-encoders have become prevalent in recent years. While it is true that these models have shown remarkable results, evaluating their performance is challenging. This issue is of vital importance to push research [...] Read more.
Generative models such as generative adversarial networks, diffusion models, and variational auto-encoders have become prevalent in recent years. While it is true that these models have shown remarkable results, evaluating their performance is challenging. This issue is of vital importance to push research forward and identify meaningful gains from random noise. Currently, heuristic metrics such as the inception score (IS) and Fréchet inception distance (FID) are the most common evaluation metrics, but what they measure is not entirely clear. Additionally, there are questions regarding how meaningful their score actually is. In this work, we propose a novel evaluation protocol for likelihood-based generative models, based on generating a high-quality synthetic dataset on which we can estimate classical metrics for comparison. This new scheme harnesses the advantages of knowing the underlying likelihood values of the data by measuring the divergence between the model-generated data and the synthetic dataset. Our study shows that while FID and IS correlate with several f-divergences, their ranking of close models can vary considerably, making them problematic when used for fine-grained comparison. We further use this experimental setting to study which evaluation metric best correlates with our probabilistic metrics. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

21 pages, 15533 KiB  
Article
Performance Assessment of Object Detection Models Trained with Synthetic Data: A Case Study on Electrical Equipment Detection
by David O. Santos, Jugurta Montalvão, Charles A. C. Araujo, Ulisses D. E. S. Lebre, Tarso V. Ferreira and Eduardo O. Freire
Sensors 2024, 24(13), 4219; https://doi.org/10.3390/s24134219 - 28 Jun 2024
Viewed by 1109
Abstract
This paper explores a data augmentation approach for images of rigid bodies, particularly focusing on electrical equipment and analogous industrial objects. By leveraging manufacturer-provided datasheets containing precise equipment dimensions, we employed straightforward algorithms to generate synthetic images, permitting the expansion of the training [...] Read more.
This paper explores a data augmentation approach for images of rigid bodies, particularly focusing on electrical equipment and analogous industrial objects. By leveraging manufacturer-provided datasheets containing precise equipment dimensions, we employed straightforward algorithms to generate synthetic images, permitting the expansion of the training dataset from a potentially unlimited viewpoint. In scenarios lacking genuine target images, we conducted a case study using two well-known detectors, representing two machine-learning paradigms: the Viola–Jones (VJ) and You Only Look Once (YOLO) detectors, trained exclusively on datasets featuring synthetic images as the positive examples of the target equipment, namely lightning rods and potential transformers. Performances of both detectors were assessed using real images in both visible and infrared spectra. YOLO consistently demonstrates F1 scores below 26% in both spectra, while VJ’s scores lie in the interval from 38% to 61%. This performance discrepancy is discussed in view of paradigms’ strengths and weaknesses, whereas the relatively high scores of at least one detector are taken as empirical evidence in favor of the proposed data augmentation approach. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

27 pages, 4105 KiB  
Article
Pollen Grain Classification Using Some Convolutional Neural Network Architectures
by Benjamin Garga, Hamadjam Abboubakar, Rodrigue Saoungoumi Sourpele, David Libouga Li Gwet and Laurent Bitjoka
J. Imaging 2024, 10(7), 158; https://doi.org/10.3390/jimaging10070158 - 28 Jun 2024
Cited by 4 | Viewed by 1140
Abstract
The main objective of this work is to use convolutional neural networks (CNN) to improve the performance in previous works on their baseline for pollen grain classification, by improving the performance of the following eight popular architectures: InceptionV3, VGG16, VGG19, ResNet50, NASNet, Xception, [...] Read more.
The main objective of this work is to use convolutional neural networks (CNN) to improve the performance in previous works on their baseline for pollen grain classification, by improving the performance of the following eight popular architectures: InceptionV3, VGG16, VGG19, ResNet50, NASNet, Xception, DenseNet201 and InceptionResNetV2, which are benchmarks on several classification tasks, like on the ImageNet dataset. We use a well-known annotated public image dataset for the Brazilian savanna, called POLLEN73S, composed of 2523 images. Holdout cross-validation is the name of the method used in this work. The experiments carried out showed that DenseNet201 and ResNet50 outperform the other CNNs tested, achieving results of 97.217% and 94.257%, respectively, in terms of accuracy, higher than the existing results, with a difference of 1.517% and 0.257%, respectively. VGG19 is the architecture with the lowest performance, achieving a result of 89.463%. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

21 pages, 86652 KiB  
Article
Toward Unbiased High-Quality Portraits through Latent-Space Evaluation
by Doaa Almhaithawi, Alessandro Bellini and Tania Cerquitelli
J. Imaging 2024, 10(7), 157; https://doi.org/10.3390/jimaging10070157 - 28 Jun 2024
Viewed by 1455
Abstract
Images, texts, voices, and signals can be synthesized by latent spaces in a multidimensional vector, which can be explored without the hurdles of noise or other interfering factors. In this paper, we present a practical use case that demonstrates the power of latent [...] Read more.
Images, texts, voices, and signals can be synthesized by latent spaces in a multidimensional vector, which can be explored without the hurdles of noise or other interfering factors. In this paper, we present a practical use case that demonstrates the power of latent space in exploring complex realities such as image space. We focus on DaVinciFace, an AI-based system that explores the StyleGAN2 space to create a high-quality portrait for anyone in the style of the Renaissance genius Leonardo da Vinci. The user enters one of their portraits and receives the corresponding Da Vinci-style portrait as an output. Since most of Da Vinci’s artworks depict young and beautiful women (e.g., “La Belle Ferroniere”, “Beatrice de’ Benci”), we investigate the ability of DaVinciFace to account for other social categorizations, including gender, race, and age. The experimental results evaluate the effectiveness of our methodology on 1158 portraits acting on the vector representations of the latent space to produce high-quality portraits that retain the facial features of the subject’s social categories, and conclude that sparser vectors have a greater effect on these features. To objectively evaluate and quantify our results, we solicited human feedback via a crowd-sourcing campaign. Analysis of the human feedback showed a high tolerance for the loss of important identity features in the resulting portraits when the Da Vinci style is more pronounced, with some exceptions, including Africanized individuals. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

10 pages, 1996 KiB  
Communication
Fractional B-Spline Wavelets and U-Net Architecture for Robust and Reliable Vehicle Detection in Snowy Conditions
by Hamam Mokayed, Christián Ulehla, Elda Shurdhaj, Amirhossein Nayebiastaneh, Lama Alkhaled, Olle Hagner and Yan Chai Hum
Sensors 2024, 24(12), 3938; https://doi.org/10.3390/s24123938 - 18 Jun 2024
Cited by 2 | Viewed by 891
Abstract
This paper addresses the critical need for advanced real-time vehicle detection methodologies in Vehicle Intelligence Systems (VIS), especially in the context of using Unmanned Aerial Vehicles (UAVs) for data acquisition in severe weather conditions, such as heavy snowfall typical of the Nordic region. [...] Read more.
This paper addresses the critical need for advanced real-time vehicle detection methodologies in Vehicle Intelligence Systems (VIS), especially in the context of using Unmanned Aerial Vehicles (UAVs) for data acquisition in severe weather conditions, such as heavy snowfall typical of the Nordic region. Traditional vehicle detection techniques, which often rely on custom-engineered features and deterministic algorithms, fall short in adapting to diverse environmental challenges, leading to a demand for more precise and sophisticated methods. The limitations of current architectures, particularly when deployed in real-time on edge devices with restricted computational capabilities, are highlighted as significant hurdles in the development of efficient vehicle detection systems. To bridge this gap, our research focuses on the formulation of an innovative approach that combines the fractional B-spline wavelet transform with a tailored U-Net architecture, operational on a Raspberry Pi 4. This method aims to enhance vehicle detection and localization by leveraging the unique attributes of the NVD dataset, which comprises drone-captured imagery under the harsh winter conditions of northern Sweden. The dataset, featuring 8450 annotated frames with 26,313 vehicles, serves as the foundation for evaluating the proposed technique. The comparative analysis of the proposed method against state-of-the-art detectors, such as YOLO and Faster RCNN, in both accuracy and efficiency on constrained devices, emphasizes the capability of our method to balance the trade-off between speed and accuracy, thereby broadening its utility across various domains. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

31 pages, 15233 KiB  
Article
Development of an Artificial Vision for a Parallel Manipulator Using Machine-to-Machine Technologies
by Arailym Nussibaliyeva, Gani Sergazin, Gulzhamal Tursunbayeva, Arman Uzbekbayev, Nursultan Zhetenbayev, Yerkebulan Nurgizat, Balzhan Bakhtiyar, Sandugash Orazaliyeva and Saltanat Yussupova
Sensors 2024, 24(12), 3792; https://doi.org/10.3390/s24123792 - 11 Jun 2024
Cited by 3 | Viewed by 1684
Abstract
This research focuses on developing an artificial vision system for a flexible delta robot manipulator and integrating it with machine-to-machine (M2M) communication to optimize real-time device interaction. This integration aims to increase the speed of the robotic system and improve its overall performance. [...] Read more.
This research focuses on developing an artificial vision system for a flexible delta robot manipulator and integrating it with machine-to-machine (M2M) communication to optimize real-time device interaction. This integration aims to increase the speed of the robotic system and improve its overall performance. The proposed combination of an artificial vision system with M2M communication can detect and recognize targets with high accuracy in real time within the limited space considered for positioning, further localization, and carrying out manufacturing processes such as assembly or sorting of parts. In this study, RGB images are used as input data for the MASK-R-CNN algorithm, and the results are processed according to the features of the delta robot arm prototype. The data obtained from MASK-R-CNN are adapted for use in the delta robot control system, considering its unique characteristics and positioning requirements. M2M technology enables the robot arm to react quickly to changes, such as moving objects or changes in their position, which is crucial for sorting and packing tasks. The system was tested under near real-world conditions to evaluate its performance and reliability. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 9248 KiB  
Article
A Visual Measurement Method for Deep Holes in Composite Material Aerospace Components
by Fantong Meng, Jiankun Yang, Guolin Yang, Haibo Lu, Zhigang Dong, Renke Kang, Dongming Guo and Yan Qin
Sensors 2024, 24(12), 3786; https://doi.org/10.3390/s24123786 - 11 Jun 2024
Viewed by 1441
Abstract
The visual measurement of deep holes in composite material workpieces constitutes a critical step in the robotic assembly of aerospace components. The positioning accuracy of assembly holes significantly impacts the assembly quality of components. However, the complex texture of the composite material surface [...] Read more.
The visual measurement of deep holes in composite material workpieces constitutes a critical step in the robotic assembly of aerospace components. The positioning accuracy of assembly holes significantly impacts the assembly quality of components. However, the complex texture of the composite material surface and mutual interference between the imaging of the inlet and outlet edges of deep holes significantly challenge hole detection. A visual measurement method for deep holes in composite materials based on the radial penalty Laplacian operator is proposed to address the issues by suppressing visual noise and enhancing the features of hole edges. Coupled with a novel inflection-point-removal algorithm, this approach enables the accurate detection of holes with a diameter of 10 mm and a depth of 50 mm in composite material components, achieving a measurement precision of 0.03 mm. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 22413 KiB  
Article
MONET: The Minor Body Generator Tool at DART Lab
by Carmine Buonagura, Mattia Pugliatti and Francesco Topputo
Sensors 2024, 24(11), 3658; https://doi.org/10.3390/s24113658 - 5 Jun 2024
Viewed by 1103
Abstract
Minor bodies exhibit considerable variability in shape and surface morphology, posing challenges for spacecraft operations, which are further compounded by highly non-linear dynamics and limited communication windows with Earth. Additionally, uncertainties persist in the shape and surface morphology of minor bodies due to [...] Read more.
Minor bodies exhibit considerable variability in shape and surface morphology, posing challenges for spacecraft operations, which are further compounded by highly non-linear dynamics and limited communication windows with Earth. Additionally, uncertainties persist in the shape and surface morphology of minor bodies due to errors in ground-based estimation techniques. The growing need for autonomy underscores the importance of robust image processing and visual-based navigation methods. To address this demand, it is essential to conduct tests on a variety of body shapes and with different surface morphological features. This work introduces the procedural Minor bOdy geNErator Tool (MONET), implemented using an open-source 3D computer graphics software. The starting point of MONET is the three-dimensional mesh of a generic minor body, which is procedurally modified by introducing craters, boulders, and surface roughness, resulting in a photorealistic model. MONET offers the flexibility to generate a diverse range of shapes and surface morphological features, aiding in the recreation of various minor bodies. Users can fine-tune relevant parameters to create the desired conditions based on the specific application requirements. The tool offers the capability to generate two default families of models: rubble-pile, characterized by numerous different-sized boulders, and comet-like, reflecting the typical morphology of comets. MONET serves as a valuable resource for researchers and engineers involved in minor body exploration missions and related projects, providing insights into the adaptability and effectiveness of guidance and navigation techniques across a wide range of morphological scenarios. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 13820 KiB  
Article
Design and Implementation of a Self-Supervised Algorithm for Vein Structural Patterns Analysis Using Advanced Unsupervised Techniques
by Swati Rastogi, Siddhartha Prakash Duttagupta and Anirban Guha
Mach. Learn. Knowl. Extr. 2024, 6(2), 1193-1209; https://doi.org/10.3390/make6020056 - 31 May 2024
Cited by 1 | Viewed by 1280
Abstract
Compared to other identity verification systems applications, vein patterns have the lowest potential for being used fraudulently. The present research examines the practicability of gathering vascular data from NIR images of veins. In this study, we propose a self-supervision learning algorithm that envisions [...] Read more.
Compared to other identity verification systems applications, vein patterns have the lowest potential for being used fraudulently. The present research examines the practicability of gathering vascular data from NIR images of veins. In this study, we propose a self-supervision learning algorithm that envisions an automated process to retrieve vascular patterns computationally using unsupervised approaches. This new self-learning algorithm sorts the vascular patterns into clusters and then uses 2D image data to recuperate the extracted vascular patterns linked to NIR templates. Our work incorporates multi-scale filtering followed by multi-scale feature extraction, recognition, identification, and matching. We design the ORC, GPO, and RDM algorithms with these inclusions and finally develop the vascular pattern mining model to visualize the computational retrieval of vascular patterns from NIR imageries. As a result, the developed self-supervised learning algorithm shows a 96.7% accuracy rate utilizing appropriate image quality assessment parameters. In our work, we also contend that we provide strategies that are both theoretically sound and practically efficient for concerns such as how many clusters should be used for specific tasks, which clustering technique should be used, how to set the threshold for single linkage algorithms, and how much data should be excluded as outliers. Consequently, we aim to circumvent Kleinberg’s impossibility while attaining significant clustering to develop a self-supervised learning algorithm using unsupervised methodologies. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 16066 KiB  
Article
A Novel Frame-Selection Metric for Video Inpainting to Enhance Urban Feature Extraction
by Yuhu Feng, Jiahuan Zhang, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa and Miki Haseyama
Sensors 2024, 24(10), 3035; https://doi.org/10.3390/s24103035 - 10 May 2024
Cited by 1 | Viewed by 1109
Abstract
In our digitally driven society, advances in software and hardware to capture video data allow extensive gathering and analysis of large datasets. This has stimulated interest in extracting information from video data, such as buildings and urban streets, to enhance understanding of the [...] Read more.
In our digitally driven society, advances in software and hardware to capture video data allow extensive gathering and analysis of large datasets. This has stimulated interest in extracting information from video data, such as buildings and urban streets, to enhance understanding of the environment. Urban buildings and streets, as essential parts of cities, carry valuable information relevant to daily life. Extracting features from these elements and integrating them with technologies such as VR and AR can contribute to more intelligent and personalized urban public services. Despite its potential benefits, collecting videos of urban environments introduces challenges because of the presence of dynamic objects. The varying shape of the target building in each frame necessitates careful selection to ensure the extraction of quality features. To address this problem, we propose a novel evaluation metric that considers the video-inpainting-restoration quality and the relevance of the target object, considering minimizing areas with cars, maximizing areas with the target building, and minimizing overlapping areas. This metric extends existing video-inpainting-evaluation metrics by considering the relevance of the target object and interconnectivity between objects. We conducted experiment to validate the proposed metrics using real-world datasets from Japanese cities Sapporo and Yokohama. The experiment results demonstrate feasibility of selecting video frames conducive to building feature extraction. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

42 pages, 122307 KiB  
Article
Toward Synthetic Physical Fingerprint Targets
by Laurenz Ruzicka, Bernhard Strobl, Stephan Bergmann, Gerd Nolden, Tom Michalsky, Christoph Domscheit, Jannis Priesnitz, Florian Blümel, Bernhard Kohn and Clemens Heitzinger
Sensors 2024, 24(9), 2847; https://doi.org/10.3390/s24092847 - 29 Apr 2024
Viewed by 1412
Abstract
Biometric fingerprint identification hinges on the reliability of its sensors; however, calibrating and standardizing these sensors poses significant challenges, particularly in regards to repeatability and data diversity. To tackle these issues, we propose methodologies for fabricating synthetic 3D fingerprint targets, or phantoms, that [...] Read more.
Biometric fingerprint identification hinges on the reliability of its sensors; however, calibrating and standardizing these sensors poses significant challenges, particularly in regards to repeatability and data diversity. To tackle these issues, we propose methodologies for fabricating synthetic 3D fingerprint targets, or phantoms, that closely emulate real human fingerprints. These phantoms enable the precise evaluation and validation of fingerprint sensors under controlled and repeatable conditions. Our research employs laser engraving, 3D printing, and CNC machining techniques, utilizing different materials. We assess the phantoms’ fidelity to synthetic fingerprint patterns, intra-class variability, and interoperability across different manufacturing methods. The findings demonstrate that a combination of laser engraving or CNC machining with silicone casting produces finger-like phantoms with high accuracy and consistency for rolled fingerprint recordings. For slap recordings, direct laser engraving of flat silicone targets excels, and in the contactless fingerprint sensor setting, 3D printing and silicone filling provide the most favorable attributes. Our work enables a comprehensive, method-independent comparison of various fabrication methodologies, offering a unique perspective on the strengths and weaknesses of each approach. This facilitates a broader understanding of fingerprint recognition system validation and performance assessment. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 9029 KiB  
Article
Optimizing Cattle Behavior Analysis in Precision Livestock Farming: Integrating YOLOv7-E6E with AutoAugment and GridMask to Enhance Detection Accuracy
by Hyeon-seok Sim, Tae-kyeong Kim, Chang-woo Lee, Chang-sik Choi, Jin Soo Kim and Hyun-chong Cho
Appl. Sci. 2024, 14(9), 3667; https://doi.org/10.3390/app14093667 - 25 Apr 2024
Viewed by 1208
Abstract
Recently, the growing demand for meat has increased interest in precision livestock farming (PLF), wherein monitoring livestock behavior is crucial for assessing animal health. We introduce a novel cattle behavior detection model that leverages data from 2D RGB cameras. It primarily employs you [...] Read more.
Recently, the growing demand for meat has increased interest in precision livestock farming (PLF), wherein monitoring livestock behavior is crucial for assessing animal health. We introduce a novel cattle behavior detection model that leverages data from 2D RGB cameras. It primarily employs you only look once (YOLO)v7-E6E, which is a real-time object detection framework renowned for its efficiency across various applications. Notably, the proposed model enhances network performance without incurring additional inference costs. We primarily focused on performance enhancement and evaluation of the model by integrating AutoAugment and GridMask to augment the original dataset. AutoAugment, a reinforcement learning algorithm, was employed to determine the most effective data augmentation policy. Concurrently, we applied GridMask, a novel data augmentation technique that systematically eliminates square regions in a grid pattern to improve model robustness. Our results revealed that when trained on the original dataset, the model achieved a mean average precision (mAP) of 88.2%, which increased by 2.9% after applying AutoAugment. The performance was further improved by combining AutoAugment and GridMask, resulting in a notable 4.8% increase in the mAP, thereby achieving a final mAP of 93.0%. This demonstrates the efficacy of these augmentation strategies in improving cattle behavior detection for PLF. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

23 pages, 10179 KiB  
Article
A Degraded Finger Vein Image Recovery and Enhancement Algorithm Based on Atmospheric Scattering Theory
by Dingzhong Feng, Peng Feng, Yongbo Mao, Yang Zhou, Yuqing Zeng and Ye Zhang
Sensors 2024, 24(9), 2684; https://doi.org/10.3390/s24092684 - 24 Apr 2024
Viewed by 1423
Abstract
With the development of biometric identification technology, finger vein identification has received more and more widespread attention for its security, efficiency, and stability. However, because of the performance of the current standard finger vein image acquisition device and the complex internal organization of [...] Read more.
With the development of biometric identification technology, finger vein identification has received more and more widespread attention for its security, efficiency, and stability. However, because of the performance of the current standard finger vein image acquisition device and the complex internal organization of the finger, the acquired images are often heavily degraded and have lost their texture characteristics. This makes the topology of the finger veins inconspicuous or even difficult to distinguish, greatly affecting the identification accuracy. Therefore, this paper proposes a finger vein image recovery and enhancement algorithm using atmospheric scattering theory. Firstly, to normalize the local over-bright and over-dark regions of finger vein images within a certain threshold, the Gamma transform method is improved in this paper to correct and measure the gray value of a given image. Then, we reconstruct the image based on atmospheric scattering theory and design a pixel mutation filter to segment the venous and non-venous contact zones. Finally, the degraded finger vein images are recovered and enhanced by global image gray value normalization. Experiments on SDUMLA-HMT and ZJ-UVM datasets show that our proposed method effectively achieves the recovery and enhancement of degraded finger vein images. The image restoration and enhancement algorithm proposed in this paper performs well in finger vein recognition using traditional methods, machine learning, and deep learning. The recognition accuracy of the processed image is improved by more than 10% compared to the original image. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 2283 KiB  
Article
Development and Implementation of an Innovative Framework for Automated Radiomics Analysis in Neuroimaging
by Chiara Camastra, Giovanni Pasini, Alessandro Stefano, Giorgio Russo, Basilio Vescio, Fabiano Bini, Franco Marinozzi and Antonio Augimeri
J. Imaging 2024, 10(4), 96; https://doi.org/10.3390/jimaging10040096 - 22 Apr 2024
Viewed by 1953
Abstract
Radiomics represents an innovative approach to medical image analysis, enabling comprehensive quantitative evaluation of radiological images through advanced image processing and Machine or Deep Learning algorithms. This technique uncovers intricate data patterns beyond human visual detection. Traditionally, executing a radiomic pipeline involves multiple [...] Read more.
Radiomics represents an innovative approach to medical image analysis, enabling comprehensive quantitative evaluation of radiological images through advanced image processing and Machine or Deep Learning algorithms. This technique uncovers intricate data patterns beyond human visual detection. Traditionally, executing a radiomic pipeline involves multiple standardized phases across several software platforms. This could represent a limit that was overcome thanks to the development of the matRadiomics application. MatRadiomics, a freely available, IBSI-compliant tool, features its intuitive Graphical User Interface (GUI), facilitating the entire radiomics workflow from DICOM image importation to segmentation, feature selection and extraction, and Machine Learning model construction. In this project, an extension of matRadiomics was developed to support the importation of brain MRI images and segmentations in NIfTI format, thus extending its applicability to neuroimaging. This enhancement allows for the seamless execution of radiomic pipelines within matRadiomics, offering substantial advantages to the realm of neuroimaging. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 5712 KiB  
Article
Soil Sampling Map Optimization with a Dual Deep Learning Framework
by Tan-Hanh Pham and Kim-Doang Nguyen
Mach. Learn. Knowl. Extr. 2024, 6(2), 751-769; https://doi.org/10.3390/make6020035 - 29 Mar 2024
Cited by 1 | Viewed by 1773
Abstract
Soil sampling constitutes a fundamental process in agriculture, enabling precise soil analysis and optimal fertilization. The automated selection of accurate soil sampling locations representative of a given field is critical for informed soil treatment decisions. This study leverages recent advancements in deep learning [...] Read more.
Soil sampling constitutes a fundamental process in agriculture, enabling precise soil analysis and optimal fertilization. The automated selection of accurate soil sampling locations representative of a given field is critical for informed soil treatment decisions. This study leverages recent advancements in deep learning to develop efficient tools for generating soil sampling maps. We proposed two models, namely UDL and UFN, which are the results of innovations in machine learning architecture design and integration. The models are meticulously trained on a comprehensive soil sampling dataset collected from local farms in South Dakota. The data include five key attributes: aspect, flow accumulation, slope, normalized difference vegetation index, and yield. The inputs to the models consist of multispectral images, and the ground truths are highly unbalanced binary images. To address this challenge, we innovate a feature extraction technique to find patterns and characteristics from the data before using these refined features for further processing and generating soil sampling maps. Our approach is centered around building a refiner that extracts fine features and a selector that utilizes these features to produce prediction maps containing the selected optimal soil sampling locations. Our experimental results demonstrate the superiority of our tools compared to existing methods. During testing, our proposed models exhibit outstanding performance, achieving the highest mean Intersection over Union of 60.82% and mean Dice Coefficient of 73.74%. The research not only introduces an innovative tool for soil sampling but also lays the foundation for the integration of traditional and modern soil sampling methods. This work provides a promising solution for precision agriculture and soil management. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 2824 KiB  
Article
Time of Flight Distance Sensor–Based Construction Equipment Activity Detection Method
by Young-Jun Park and Chang-Yong Yi
Appl. Sci. 2024, 14(7), 2859; https://doi.org/10.3390/app14072859 - 28 Mar 2024
Cited by 1 | Viewed by 1310
Abstract
In this study, we delve into a novel approach by employing a sensor-based pattern recognition model to address the automation of construction equipment activity analysis. The model integrates time of flight (ToF) sensors with deep convolutional neural networks (DCNNs) to accurately classify the [...] Read more.
In this study, we delve into a novel approach by employing a sensor-based pattern recognition model to address the automation of construction equipment activity analysis. The model integrates time of flight (ToF) sensors with deep convolutional neural networks (DCNNs) to accurately classify the operational activities of construction equipment, focusing on piston movements. The research utilized a one-twelfth-scale excavator model, processing the displacement ratios of its pistons into a unified dataset for analysis. Methodologically, the study outlines the setup of the sensor modules and their integration with a controller, emphasizing the precision in capturing equipment dynamics. The DCNN model, characterized by its four-layered convolutional blocks, was meticulously tuned within the MATLAB environment, demonstrating the model’s learning capabilities through hyperparameter optimization. An analysis of 2070 samples representing six distinct excavator activities yielded an impressive average precision of 95.51% and a recall of 95.31%, with an overall model accuracy of 95.19%. When compared against other vision-based and accelerometer-based methods, the proposed model showcases enhanced performance and reliability under controlled experimental conditions. This substantiates its potential for practical application in real-world construction scenarios, marking a significant advancement in the field of construction equipment monitoring. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 7570 KiB  
Article
Semantic Segmentation of Remote Sensing Images Depicting Environmental Hazards in High-Speed Rail Network Based on Large-Model Pre-Classification
by Qi Dong, Xiaomei Chen, Lili Jiang, Lin Wang, Jiachong Chen and Ying Zhao
Sensors 2024, 24(6), 1876; https://doi.org/10.3390/s24061876 - 14 Mar 2024
Cited by 3 | Viewed by 1308
Abstract
With the rapid development of China’s railways, ensuring the safety of the operating environment of high-speed railways faces daunting challenges. In response to safety hazards posed by light and heavy floating objects during the operation of trains, we propose a dual-branch semantic segmentation [...] Read more.
With the rapid development of China’s railways, ensuring the safety of the operating environment of high-speed railways faces daunting challenges. In response to safety hazards posed by light and heavy floating objects during the operation of trains, we propose a dual-branch semantic segmentation network with the fusion of large models (SAMUnet). The encoder part of this network uses a dual-branch structure, in which the backbone branch uses a residual network for feature extraction and the large-model branch leverages the results of feature extraction generated by the segment anything model (SAM). Moreover, a decoding attention module is fused with the results of prediction of the SAM in the decoder part to enhance the performance of the network. We conducted experiments on the Inria Aerial Image Labeling (IAIL), Massachusetts, and high-speed railway hazards datasets to verify the effectiveness and applicability of the proposed SAMUnet network in comparison with commonly used semantic segmentation networks. The results demonstrated its superiority in terms of both the accuracies of segmentation and feature extraction. It was able to precisely extract hazards in the environment of high-speed railways to significantly improve the accuracy of semantic segmentation. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 3336 KiB  
Article
Dazzling Evaluation of the Impact of a High-Repetition-Rate CO2 Pulsed Laser on Infrared Imaging Systems
by Hanyu Zheng, Yunzhe Wang, Yang Liu, Tao Sun and Junfeng Shao
Sensors 2024, 24(6), 1827; https://doi.org/10.3390/s24061827 - 12 Mar 2024
Viewed by 1216
Abstract
This article utilizes the Canny edge extraction algorithm based on contour curvature and the cross-correlation template matching algorithm to extensively study the impact of a high-repetition-rate CO2 pulsed laser on the target extraction and tracking performance of an infrared imaging detector. It [...] Read more.
This article utilizes the Canny edge extraction algorithm based on contour curvature and the cross-correlation template matching algorithm to extensively study the impact of a high-repetition-rate CO2 pulsed laser on the target extraction and tracking performance of an infrared imaging detector. It establishes a quantified dazzling pattern for lasers on infrared imaging systems. By conducting laser dazzling and damage experiments, a detailed analysis of the normalized correlation between the target and the dazzling images is performed to quantitatively describe the laser dazzling effects. Simultaneously, an evaluation system, including target distance and laser power evaluation factors, is established to determine the dazzling level and whether the target is recognizable. The research results reveal that the laser power and target position are crucial factors affecting the detection performance of infrared imaging detector systems under laser dazzling. Different laser powers are required to successfully interfere with the recognition algorithm of the infrared imaging detector at different distances. And laser dazzling produces a considerable quantity of false edge information, which seriously affects the performance of the pattern recognition algorithm. In laser damage experiments, the detector experienced functional damage, with a quarter of the image displaying as completely black. The energy density threshold required for the functional damage of the detector is approximately 3 J/cm2. The dazzling assessment conclusions also apply to the evaluation of the damage results. Finally, the proposed evaluation formula aligns with the experimental results, objectively reflecting the actual impact of laser dazzling on the target extraction and the tracking performance of infrared imaging systems. This study provides an in-depth and accurate analysis for understanding the influence of lasers on the performance of infrared imaging detectors. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

23 pages, 3795 KiB  
Article
Classifying Breast Tumors in Digital Tomosynthesis by Combining Image Quality-Aware Features and Tumor Texture Descriptors
by Loay Hassan, Mohamed Abdel-Nasser, Adel Saleh and Domenec Puig
Mach. Learn. Knowl. Extr. 2024, 6(1), 619-641; https://doi.org/10.3390/make6010029 - 11 Mar 2024
Viewed by 2094
Abstract
Digital breast tomosynthesis (DBT) is a 3D breast cancer screening technique that can overcome the limitations of standard 2D digital mammography. However, DBT images often suffer from artifacts stemming from acquisition conditions, a limited angular range, and low radiation doses. These artifacts have [...] Read more.
Digital breast tomosynthesis (DBT) is a 3D breast cancer screening technique that can overcome the limitations of standard 2D digital mammography. However, DBT images often suffer from artifacts stemming from acquisition conditions, a limited angular range, and low radiation doses. These artifacts have the potential to degrade the performance of automated breast tumor classification tools. Notably, most existing automated breast tumor classification methods do not consider the effect of DBT image quality when designing the classification models. In contrast, this paper introduces a novel deep learning-based framework for classifying breast tumors in DBT images. This framework combines global image quality-aware features with tumor texture descriptors. The proposed approach employs a two-branch model: in the top branch, a deep convolutional neural network (CNN) model is trained to extract robust features from the region of interest that includes the tumor. In the bottom branch, a deep learning model named TomoQA is trained to extract global image quality-aware features from input DBT images. The quality-aware features and the tumor descriptors are then combined and fed into a fully-connected layer to classify breast tumors as benign or malignant. The unique advantage of this model is the combination of DBT image quality-aware features with tumor texture descriptors, which helps accurately classify breast tumors as benign or malignant. Experimental results on a publicly available DBT image dataset demonstrate that the proposed framework achieves superior breast tumor classification results, outperforming all existing deep learning-based methods. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 5765 KiB  
Article
Real-Time Cucumber Target Recognition in Greenhouse Environments Using Color Segmentation and Shape Matching
by Wenbo Liu, Haonan Sun, Yu Xia and Jie Kang
Appl. Sci. 2024, 14(5), 1884; https://doi.org/10.3390/app14051884 - 25 Feb 2024
Viewed by 1217
Abstract
Accurate identification of fruits in greenhouse environments is an essential need for the precise functioning of agricultural robots. This study presents a solution to the problem of distinguishing cucumber fruits from their stems and leaves, which often have similar colors in their natural [...] Read more.
Accurate identification of fruits in greenhouse environments is an essential need for the precise functioning of agricultural robots. This study presents a solution to the problem of distinguishing cucumber fruits from their stems and leaves, which often have similar colors in their natural environment. The proposed algorithm for cucumber fruit identification relies on color segmentation and form matching. First, we get the boundary details from the acquired image of the cucumber sample. The edge information is described and reconstructed by utilizing a shape descriptor known as the Fourier descriptor in order to acquire a matching template image. Subsequently, we generate a multi-scale template by amalgamating computational and real-world data. The target image is subjected to color conditioning in order to enhance the segmenacktation of the target region inside the HSV color space. Then, the segmented target region is compared to the multi-scale template based on its shape. The method of color segmentation decreases the presence of unwanted information in the target image, hence improving the effectiveness of shape matching. An analysis was performed on a set of 200 cucumber photos that were obtained from the field. The findings indicate that the method presented in this study surpasses conventional recognition algorithms in terms of accuracy and efficiency, with a recognition rate of up to 86%. Moreover, the system has exceptional proficiency in identifying cucumber targets within greenhouses. This attribute renders it a great resource for offering technical assistance to agricultural robots that operate with accuracy. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 587 KiB  
Article
CRAS: Curriculum Regularization and Adaptive Semi-Supervised Learning with Noisy Labels
by Ryota Higashimoto, Soh Yoshida and Mitsuji Muneyasu
Appl. Sci. 2024, 14(3), 1208; https://doi.org/10.3390/app14031208 - 31 Jan 2024
Cited by 2 | Viewed by 1187
Abstract
This paper addresses the performance degradation of deep neural networks caused by learning with noisy labels. Recent research on this topic has exploited the memorization effect: networks fit data with clean labels during the early stages of learning and eventually memorize data with [...] Read more.
This paper addresses the performance degradation of deep neural networks caused by learning with noisy labels. Recent research on this topic has exploited the memorization effect: networks fit data with clean labels during the early stages of learning and eventually memorize data with noisy labels. This property allows for the separation of clean and noisy samples from a loss distribution. In recent years, semi-supervised learning, which divides training data into a set of labeled clean samples and a set of unlabeled noisy samples, has achieved impressive results. However, this strategy has two significant problems: (1) the accuracy of dividing the data into clean and noisy samples depends strongly on the network’s performance, and (2) if the divided data are biased towards the unlabeled samples, there are few labeled samples, causing the network to overfit to the labels and leading to a poor generalization performance. To solve these problems, we propose the curriculum regularization and adaptive semi-supervised learning (CRAS) method. Its key ideas are (1) to train the network with robust regularization techniques as a warm-up before dividing the data, and (2) to control the strength of the regularization using loss weights that adaptively respond to data bias, which varies with each split at each training epoch. We evaluated the performance of CRAS on benchmark image classification datasets, CIFAR-10 and CIFAR-100, and real-world datasets, mini-WebVision and Clothing1M. The findings demonstrate that CRAS excels in handling noisy labels, resulting in a superior generalization and robustness to a range of noise rates, compared with the existing method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3585 KiB  
Article
Enhancement of GUI Display Error Detection Using Improved Faster R-CNN and Multi-Scale Attention Mechanism
by Xi Pan, Zhan Huan, Yimang Li and Yingying Cao
Appl. Sci. 2024, 14(3), 1144; https://doi.org/10.3390/app14031144 - 30 Jan 2024
Viewed by 1747
Abstract
Graphical user interfaces (GUIs) hold an irreplaceable position in modern software and applications. Users can interact through them. Due to different terminal devices, there are sometimes display errors, such as component occlusion, image loss, text overlap, and empty values during software rendering. To [...] Read more.
Graphical user interfaces (GUIs) hold an irreplaceable position in modern software and applications. Users can interact through them. Due to different terminal devices, there are sometimes display errors, such as component occlusion, image loss, text overlap, and empty values during software rendering. To address the aforementioned common four GUI display errors, a target detection algorithm based on the improved Faster R-CNN is proposed. Specifically, ResNet-50 is used instead of the traditional VGG-16 as the feature extraction network. The feature pyramid network (FPN) and the enhanced multi-scale attention (EMA) algorithm are introduced to improve accuracy. ROI-Align is used instead of ROI-Pooling to enhance the generalization capability of the network. Since training models require a large number of labeled screenshots of errors, there is currently no publicly available dataset with GUI display problems. Therefore, a training data generation algorithm has been developed, which can automatically generate screenshots with GUI display problems based on the Rico dataset. Experimental results show that the improved Faster R-CNN achieves a detection accuracy of 87.3% in the generated GUI problem dataset, which is a 7% improvement compared to the previous version. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3370 KiB  
Article
Deep Learning-Based Technique for Remote Sensing Image Enhancement Using Multiscale Feature Fusion
by Ming Zhao, Rui Yang, Min Hu and Botao Liu
Sensors 2024, 24(2), 673; https://doi.org/10.3390/s24020673 - 21 Jan 2024
Cited by 9 | Viewed by 2314
Abstract
The present study proposes a novel deep-learning model for remote sensing image enhancement. It maintains image details while enhancing brightness in the feature extraction module. An improved hierarchical model named Global Spatial Attention Network (GSA-Net), based on U-Net for image enhancement, is proposed [...] Read more.
The present study proposes a novel deep-learning model for remote sensing image enhancement. It maintains image details while enhancing brightness in the feature extraction module. An improved hierarchical model named Global Spatial Attention Network (GSA-Net), based on U-Net for image enhancement, is proposed to improve the model’s performance. To circumvent the issue of insufficient sample data, gamma correction is applied to create low-light images, which are then used as training examples. A loss function is constructed using the Structural Similarity (SSIM) and Peak Signal-to-Noise Ratio (PSNR) indices. The GSA-Net network and loss function are utilized to restore images obtained via low-light remote sensing. This proposed method was tested on the Northwestern Polytechnical University Very-High-Resolution 10 (NWPU VHR-10) dataset, and its overall superiority was demonstrated in comparison with other state-of-the-art algorithms using various objective assessment indicators, such as PSNR, SSIM, and Learned Perceptual Image Patch Similarity (LPIPS). Furthermore, in high-level visual tasks such as object detection, this novel method provides better remote sensing images with distinct details and higher contrast than the competing methods. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 20196 KiB  
Article
Synthetic Document Images with Diverse Shadows for Deep Shadow Removal Networks
by Yuhi Matsuo and Yoshimitsu Aoki
Sensors 2024, 24(2), 654; https://doi.org/10.3390/s24020654 - 19 Jan 2024
Cited by 2 | Viewed by 2144
Abstract
Shadow removal for document images is an essential task for digitized document applications. Recent shadow removal models have been trained on pairs of shadow images and shadow-free images. However, obtaining a large, diverse dataset for document shadow removal takes time and effort. Thus, [...] Read more.
Shadow removal for document images is an essential task for digitized document applications. Recent shadow removal models have been trained on pairs of shadow images and shadow-free images. However, obtaining a large, diverse dataset for document shadow removal takes time and effort. Thus, only small real datasets are available. Graphic renderers have been used to synthesize shadows to create relatively large datasets. However, the limited number of unique documents and the limited lighting environments adversely affect the network performance. This paper presents a large-scale, diverse dataset called the Synthetic Document with Diverse Shadows (SynDocDS) dataset. The SynDocDS comprises rendered images with diverse shadows augmented by a physics-based illumination model, which can be utilized to obtain a more robust and high-performance deep shadow removal network. In this paper, we further propose a Dual Shadow Fusion Network (DSFN). Unlike natural images, document images often have constant background colors requiring a high understanding of global color features for training a deep shadow removal network. The DSFN has a high global color comprehension and understanding of shadow regions and merges shadow attentions and features efficiently. We conduct experiments on three publicly available datasets, the OSR, Kligler’s, and Jung’s datasets, to validate our proposed method’s effectiveness. In comparison to training on existing synthetic datasets, our model training on the SynDocDS dataset achieves an enhancement in the PSNR and SSIM, increasing them from 23.00 dB to 25.70 dB and 0.959 to 0.971 on average. In addition, the experiments demonstrated that our DSFN clearly outperformed other networks across multiple metrics, including the PSNR, the SSIM, and its impact on OCR performance. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 11956 KiB  
Article
Decomposition Technique for Bio-Transmittance Imaging Based on Attenuation Coefficient Matrix Inverse
by Purnomo Sidi Priambodo, Toto Aminoto and Basari Basari
J. Imaging 2024, 10(1), 22; https://doi.org/10.3390/jimaging10010022 - 15 Jan 2024
Viewed by 1864
Abstract
Human body tissue disease diagnosis will become more accurate if transmittance images, such as X-ray images, are separated according to each constituent tissue. This research proposes a new image decomposition technique based on the matrix inverse method for biological tissue images. The fundamental [...] Read more.
Human body tissue disease diagnosis will become more accurate if transmittance images, such as X-ray images, are separated according to each constituent tissue. This research proposes a new image decomposition technique based on the matrix inverse method for biological tissue images. The fundamental idea of this research is based on the fact that when k different monochromatic lights penetrate a biological tissue, they will experience different attenuation coefficients. Furthermore, the same happens when monochromatic light penetrates k different biological tissues, as they will also experience different attenuation coefficients. The various attenuation coefficients are arranged into a unique k×k-dimensional square matrix. k-many images taken by k-many different monochromatic lights are then merged into an image vector entity; further, a matrix inverse operation is performed on the merged image, producing N-many tissue thickness images of the constituent tissues. This research demonstrates that the proposed method effectively decomposes images of biological objects into separate images, each showing the thickness distributions of different constituent tissues. In the future, this proposed new technique is expected to contribute to supporting medical imaging analysis. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1