Topic Editors

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Dr. Wenqi Ren
School of Cyber Science and Technology, Sun Yat-Sen University, Guangzhou 510275, China

Applications in Image Analysis and Pattern Recognition

Abstract submission deadline
31 May 2024
Manuscript submission deadline
31 August 2024
Viewed by
151458

Topic Information

Dear Colleagues,

There could be up to ~80% neurons in the human brain related to processing visual information and cognition. Apparently, image analysis and pattern recognition are at the core of artificial intelligence, which aims to design computer programs to achieve or mimic human-like intelligence in perception and inference in the real world. With the rapid development of visual sensors and imaging technologies, image analysis and pattern recognition techniques have been extensively applied in various artificial intelligence-related areas, from industry and agriculture to surveillance and social security, etc.

Despite the significant success in methods for image analysis and pattern recognition in the past decade, their applications in addressing real problems are still unsatisfactory. Such a status also indicates a non-neglectable gap between theoretical progress and real applications in the related areas. The collection of these topics moves toward narrowing such gaps, and so we invite papers on both theorical and applied issues related to image analysis and pattern recognition.

All interested authors are invited to submit their innovative methods on the following (but are not limited to) aspects:

  • Deep learning based methods for image analysis;
  • Deep learning based methods for video analysis;
  • Image fusion methods and applications;
  • Multimedia systems and applications;
  • Image enhancement and restoration methods and their applications;
  • Image analysis and pattern recognition for robotics and unmanned systems;
  • Document image analysis and applications;
  • Structural pattern recognition methods and applications;
  • Biomedical image analysis and applications;
  • Advances in pattern recognition theories.

Prof. Dr. Bin Fan
Dr. Wenqi Ren
Topic Editors

Keywords

  • image analysis
  • pattern recognition
  • structural pattern recognition
  • computer vision
  • multimedia analysis
  • deep learning
  • document image analysis
  • image enhancement
  • image restoration
  • biomedical image analysis
  • robotics
  • unmanned systems
  • image retrieval
  • image understanding
  • feature extraction
  • image segmentation
  • semantic segmentation
  • object detection
  • image classification
  • image acquiring techniques

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.7 4.5 2011 16.9 Days CHF 2400 Submit
Sensors
sensors
3.9 6.8 2001 17 Days CHF 2600 Submit
Journal of Imaging
jimaging
3.2 4.4 2015 21.7 Days CHF 1800 Submit
Machine Learning and Knowledge Extraction
make
3.9 8.5 2019 19.9 Days CHF 1800 Submit

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (91 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
14 pages, 2020 KiB  
Article
Optimizing Cattle Behavior Analysis in Precision Livestock Farming: Integrating YOLOv7-E6E with AutoAugment and GridMask to Enhance Detection Accuracy
by Hyeon-seok Sim, Tae-kyeong Kim, Chang-woo Lee, Chang-sik Choi, Jin Soo Kim and Hyun-chong Cho
Appl. Sci. 2024, 14(9), 3667; https://doi.org/10.3390/app14093667 (registering DOI) - 25 Apr 2024
Abstract
Recently, the growing demand for meat has increased interest in precision livestock farming (PLF), wherein monitoring livestock behavior is crucial for assessing animal health. We introduce a novel cattle behavior detection model that leverages data from 2D RGB cameras. It primarily employs you [...] Read more.
Recently, the growing demand for meat has increased interest in precision livestock farming (PLF), wherein monitoring livestock behavior is crucial for assessing animal health. We introduce a novel cattle behavior detection model that leverages data from 2D RGB cameras. It primarily employs you only look once (YOLO)v7-E6E, which is a real-time object detection framework renowned for its efficiency across various applications. Notably, the proposed model enhances network performance without incurring additional inference costs. We primarily focused on performance enhancement and evaluation of the model by integrating AutoAugment and GridMask to augment the original dataset. AutoAugment, a reinforcement learning algorithm, was employed to determine the most effective data augmentation policy. Concurrently, we applied GridMask, a novel data augmentation technique that systematically eliminates square regions in a grid pattern to improve model robustness. Our results revealed that when trained on the original dataset, the model achieved a mean average precision (mAP) of 88.2%, which increased by 2.9% after applying AutoAugment. The performance was further improved by combining AutoAugment and GridMask, resulting in a notable 4.8% increase in the mAP, thereby achieving a final mAP of 93.0%. This demonstrates the efficacy of these augmentation strategies in improving cattle behavior detection for PLF. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
23 pages, 10179 KiB  
Article
A Degraded Finger Vein Image Recovery and Enhancement Algorithm Based on Atmospheric Scattering Theory
by Dingzhong Feng, Peng Feng, Yongbo Mao, Yang Zhou, Yuqing Zeng and Ye Zhang
Sensors 2024, 24(9), 2684; https://doi.org/10.3390/s24092684 - 24 Apr 2024
Viewed by 241
Abstract
With the development of biometric identification technology, finger vein identification has received more and more widespread attention for its security, efficiency, and stability. However, because of the performance of the current standard finger vein image acquisition device and the complex internal organization of [...] Read more.
With the development of biometric identification technology, finger vein identification has received more and more widespread attention for its security, efficiency, and stability. However, because of the performance of the current standard finger vein image acquisition device and the complex internal organization of the finger, the acquired images are often heavily degraded and have lost their texture characteristics. This makes the topology of the finger veins inconspicuous or even difficult to distinguish, greatly affecting the identification accuracy. Therefore, this paper proposes a finger vein image recovery and enhancement algorithm using atmospheric scattering theory. Firstly, to normalize the local over-bright and over-dark regions of finger vein images within a certain threshold, the Gamma transform method is improved in this paper to correct and measure the gray value of a given image. Then, we reconstruct the image based on atmospheric scattering theory and design a pixel mutation filter to segment the venous and non-venous contact zones. Finally, the degraded finger vein images are recovered and enhanced by global image gray value normalization. Experiments on SDUMLA-HMT and ZJ-UVM datasets show that our proposed method effectively achieves the recovery and enhancement of degraded finger vein images. The image restoration and enhancement algorithm proposed in this paper performs well in finger vein recognition using traditional methods, machine learning, and deep learning. The recognition accuracy of the processed image is improved by more than 10% compared to the original image. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 2283 KiB  
Article
Development and Implementation of an Innovative Framework for Automated Radiomics Analysis in Neuroimaging
by Chiara Camastra, Giovanni Pasini, Alessandro Stefano, Giorgio Russo, Basilio Vescio, Fabiano Bini, Franco Marinozzi and Antonio Augimeri
J. Imaging 2024, 10(4), 96; https://doi.org/10.3390/jimaging10040096 - 22 Apr 2024
Viewed by 252
Abstract
Radiomics represents an innovative approach to medical image analysis, enabling comprehensive quantitative evaluation of radiological images through advanced image processing and Machine or Deep Learning algorithms. This technique uncovers intricate data patterns beyond human visual detection. Traditionally, executing a radiomic pipeline involves multiple [...] Read more.
Radiomics represents an innovative approach to medical image analysis, enabling comprehensive quantitative evaluation of radiological images through advanced image processing and Machine or Deep Learning algorithms. This technique uncovers intricate data patterns beyond human visual detection. Traditionally, executing a radiomic pipeline involves multiple standardized phases across several software platforms. This could represent a limit that was overcome thanks to the development of the matRadiomics application. MatRadiomics, a freely available, IBSI-compliant tool, features its intuitive Graphical User Interface (GUI), facilitating the entire radiomics workflow from DICOM image importation to segmentation, feature selection and extraction, and Machine Learning model construction. In this project, an extension of matRadiomics was developed to support the importation of brain MRI images and segmentations in NIfTI format, thus extending its applicability to neuroimaging. This enhancement allows for the seamless execution of radiomic pipelines within matRadiomics, offering substantial advantages to the realm of neuroimaging. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 5712 KiB  
Article
Soil Sampling Map Optimization with a Dual Deep Learning Framework
by Tan-Hanh Pham and Kim-Doang Nguyen
Mach. Learn. Knowl. Extr. 2024, 6(2), 751-769; https://doi.org/10.3390/make6020035 - 29 Mar 2024
Viewed by 541
Abstract
Soil sampling constitutes a fundamental process in agriculture, enabling precise soil analysis and optimal fertilization. The automated selection of accurate soil sampling locations representative of a given field is critical for informed soil treatment decisions. This study leverages recent advancements in deep learning [...] Read more.
Soil sampling constitutes a fundamental process in agriculture, enabling precise soil analysis and optimal fertilization. The automated selection of accurate soil sampling locations representative of a given field is critical for informed soil treatment decisions. This study leverages recent advancements in deep learning to develop efficient tools for generating soil sampling maps. We proposed two models, namely UDL and UFN, which are the results of innovations in machine learning architecture design and integration. The models are meticulously trained on a comprehensive soil sampling dataset collected from local farms in South Dakota. The data include five key attributes: aspect, flow accumulation, slope, normalized difference vegetation index, and yield. The inputs to the models consist of multispectral images, and the ground truths are highly unbalanced binary images. To address this challenge, we innovate a feature extraction technique to find patterns and characteristics from the data before using these refined features for further processing and generating soil sampling maps. Our approach is centered around building a refiner that extracts fine features and a selector that utilizes these features to produce prediction maps containing the selected optimal soil sampling locations. Our experimental results demonstrate the superiority of our tools compared to existing methods. During testing, our proposed models exhibit outstanding performance, achieving the highest mean Intersection over Union of 60.82% and mean Dice Coefficient of 73.74%. The research not only introduces an innovative tool for soil sampling but also lays the foundation for the integration of traditional and modern soil sampling methods. This work provides a promising solution for precision agriculture and soil management. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 2824 KiB  
Article
Time of Flight Distance Sensor–Based Construction Equipment Activity Detection Method
by Young-Jun Park and Chang-Yong Yi
Appl. Sci. 2024, 14(7), 2859; https://doi.org/10.3390/app14072859 - 28 Mar 2024
Viewed by 474
Abstract
In this study, we delve into a novel approach by employing a sensor-based pattern recognition model to address the automation of construction equipment activity analysis. The model integrates time of flight (ToF) sensors with deep convolutional neural networks (DCNNs) to accurately classify the [...] Read more.
In this study, we delve into a novel approach by employing a sensor-based pattern recognition model to address the automation of construction equipment activity analysis. The model integrates time of flight (ToF) sensors with deep convolutional neural networks (DCNNs) to accurately classify the operational activities of construction equipment, focusing on piston movements. The research utilized a one-twelfth-scale excavator model, processing the displacement ratios of its pistons into a unified dataset for analysis. Methodologically, the study outlines the setup of the sensor modules and their integration with a controller, emphasizing the precision in capturing equipment dynamics. The DCNN model, characterized by its four-layered convolutional blocks, was meticulously tuned within the MATLAB environment, demonstrating the model’s learning capabilities through hyperparameter optimization. An analysis of 2070 samples representing six distinct excavator activities yielded an impressive average precision of 95.51% and a recall of 95.31%, with an overall model accuracy of 95.19%. When compared against other vision-based and accelerometer-based methods, the proposed model showcases enhanced performance and reliability under controlled experimental conditions. This substantiates its potential for practical application in real-world construction scenarios, marking a significant advancement in the field of construction equipment monitoring. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 7570 KiB  
Article
Semantic Segmentation of Remote Sensing Images Depicting Environmental Hazards in High-Speed Rail Network Based on Large-Model Pre-Classification
by Qi Dong, Xiaomei Chen, Lili Jiang, Lin Wang, Jiachong Chen and Ying Zhao
Sensors 2024, 24(6), 1876; https://doi.org/10.3390/s24061876 - 14 Mar 2024
Viewed by 462
Abstract
With the rapid development of China’s railways, ensuring the safety of the operating environment of high-speed railways faces daunting challenges. In response to safety hazards posed by light and heavy floating objects during the operation of trains, we propose a dual-branch semantic segmentation [...] Read more.
With the rapid development of China’s railways, ensuring the safety of the operating environment of high-speed railways faces daunting challenges. In response to safety hazards posed by light and heavy floating objects during the operation of trains, we propose a dual-branch semantic segmentation network with the fusion of large models (SAMUnet). The encoder part of this network uses a dual-branch structure, in which the backbone branch uses a residual network for feature extraction and the large-model branch leverages the results of feature extraction generated by the segment anything model (SAM). Moreover, a decoding attention module is fused with the results of prediction of the SAM in the decoder part to enhance the performance of the network. We conducted experiments on the Inria Aerial Image Labeling (IAIL), Massachusetts, and high-speed railway hazards datasets to verify the effectiveness and applicability of the proposed SAMUnet network in comparison with commonly used semantic segmentation networks. The results demonstrated its superiority in terms of both the accuracies of segmentation and feature extraction. It was able to precisely extract hazards in the environment of high-speed railways to significantly improve the accuracy of semantic segmentation. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 3336 KiB  
Article
Dazzling Evaluation of the Impact of a High-Repetition-Rate CO2 Pulsed Laser on Infrared Imaging Systems
by Hanyu Zheng, Yunzhe Wang, Yang Liu, Tao Sun and Junfeng Shao
Sensors 2024, 24(6), 1827; https://doi.org/10.3390/s24061827 - 12 Mar 2024
Viewed by 408
Abstract
This article utilizes the Canny edge extraction algorithm based on contour curvature and the cross-correlation template matching algorithm to extensively study the impact of a high-repetition-rate CO2 pulsed laser on the target extraction and tracking performance of an infrared imaging detector. It [...] Read more.
This article utilizes the Canny edge extraction algorithm based on contour curvature and the cross-correlation template matching algorithm to extensively study the impact of a high-repetition-rate CO2 pulsed laser on the target extraction and tracking performance of an infrared imaging detector. It establishes a quantified dazzling pattern for lasers on infrared imaging systems. By conducting laser dazzling and damage experiments, a detailed analysis of the normalized correlation between the target and the dazzling images is performed to quantitatively describe the laser dazzling effects. Simultaneously, an evaluation system, including target distance and laser power evaluation factors, is established to determine the dazzling level and whether the target is recognizable. The research results reveal that the laser power and target position are crucial factors affecting the detection performance of infrared imaging detector systems under laser dazzling. Different laser powers are required to successfully interfere with the recognition algorithm of the infrared imaging detector at different distances. And laser dazzling produces a considerable quantity of false edge information, which seriously affects the performance of the pattern recognition algorithm. In laser damage experiments, the detector experienced functional damage, with a quarter of the image displaying as completely black. The energy density threshold required for the functional damage of the detector is approximately 3 J/cm2. The dazzling assessment conclusions also apply to the evaluation of the damage results. Finally, the proposed evaluation formula aligns with the experimental results, objectively reflecting the actual impact of laser dazzling on the target extraction and the tracking performance of infrared imaging systems. This study provides an in-depth and accurate analysis for understanding the influence of lasers on the performance of infrared imaging detectors. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

23 pages, 3795 KiB  
Article
Classifying Breast Tumors in Digital Tomosynthesis by Combining Image Quality-Aware Features and Tumor Texture Descriptors
by Loay Hassan, Mohamed Abdel-Nasser, Adel Saleh and Domenec Puig
Mach. Learn. Knowl. Extr. 2024, 6(1), 619-641; https://doi.org/10.3390/make6010029 - 11 Mar 2024
Viewed by 865
Abstract
Digital breast tomosynthesis (DBT) is a 3D breast cancer screening technique that can overcome the limitations of standard 2D digital mammography. However, DBT images often suffer from artifacts stemming from acquisition conditions, a limited angular range, and low radiation doses. These artifacts have [...] Read more.
Digital breast tomosynthesis (DBT) is a 3D breast cancer screening technique that can overcome the limitations of standard 2D digital mammography. However, DBT images often suffer from artifacts stemming from acquisition conditions, a limited angular range, and low radiation doses. These artifacts have the potential to degrade the performance of automated breast tumor classification tools. Notably, most existing automated breast tumor classification methods do not consider the effect of DBT image quality when designing the classification models. In contrast, this paper introduces a novel deep learning-based framework for classifying breast tumors in DBT images. This framework combines global image quality-aware features with tumor texture descriptors. The proposed approach employs a two-branch model: in the top branch, a deep convolutional neural network (CNN) model is trained to extract robust features from the region of interest that includes the tumor. In the bottom branch, a deep learning model named TomoQA is trained to extract global image quality-aware features from input DBT images. The quality-aware features and the tumor descriptors are then combined and fed into a fully-connected layer to classify breast tumors as benign or malignant. The unique advantage of this model is the combination of DBT image quality-aware features with tumor texture descriptors, which helps accurately classify breast tumors as benign or malignant. Experimental results on a publicly available DBT image dataset demonstrate that the proposed framework achieves superior breast tumor classification results, outperforming all existing deep learning-based methods. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 5765 KiB  
Article
Real-Time Cucumber Target Recognition in Greenhouse Environments Using Color Segmentation and Shape Matching
by Wenbo Liu, Haonan Sun, Yu Xia and Jie Kang
Appl. Sci. 2024, 14(5), 1884; https://doi.org/10.3390/app14051884 - 25 Feb 2024
Viewed by 445
Abstract
Accurate identification of fruits in greenhouse environments is an essential need for the precise functioning of agricultural robots. This study presents a solution to the problem of distinguishing cucumber fruits from their stems and leaves, which often have similar colors in their natural [...] Read more.
Accurate identification of fruits in greenhouse environments is an essential need for the precise functioning of agricultural robots. This study presents a solution to the problem of distinguishing cucumber fruits from their stems and leaves, which often have similar colors in their natural environment. The proposed algorithm for cucumber fruit identification relies on color segmentation and form matching. First, we get the boundary details from the acquired image of the cucumber sample. The edge information is described and reconstructed by utilizing a shape descriptor known as the Fourier descriptor in order to acquire a matching template image. Subsequently, we generate a multi-scale template by amalgamating computational and real-world data. The target image is subjected to color conditioning in order to enhance the segmenacktation of the target region inside the HSV color space. Then, the segmented target region is compared to the multi-scale template based on its shape. The method of color segmentation decreases the presence of unwanted information in the target image, hence improving the effectiveness of shape matching. An analysis was performed on a set of 200 cucumber photos that were obtained from the field. The findings indicate that the method presented in this study surpasses conventional recognition algorithms in terms of accuracy and efficiency, with a recognition rate of up to 86%. Moreover, the system has exceptional proficiency in identifying cucumber targets within greenhouses. This attribute renders it a great resource for offering technical assistance to agricultural robots that operate with accuracy. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 587 KiB  
Article
CRAS: Curriculum Regularization and Adaptive Semi-Supervised Learning with Noisy Labels
by Ryota Higashimoto, Soh Yoshida and Mitsuji Muneyasu
Appl. Sci. 2024, 14(3), 1208; https://doi.org/10.3390/app14031208 - 31 Jan 2024
Viewed by 492
Abstract
This paper addresses the performance degradation of deep neural networks caused by learning with noisy labels. Recent research on this topic has exploited the memorization effect: networks fit data with clean labels during the early stages of learning and eventually memorize data with [...] Read more.
This paper addresses the performance degradation of deep neural networks caused by learning with noisy labels. Recent research on this topic has exploited the memorization effect: networks fit data with clean labels during the early stages of learning and eventually memorize data with noisy labels. This property allows for the separation of clean and noisy samples from a loss distribution. In recent years, semi-supervised learning, which divides training data into a set of labeled clean samples and a set of unlabeled noisy samples, has achieved impressive results. However, this strategy has two significant problems: (1) the accuracy of dividing the data into clean and noisy samples depends strongly on the network’s performance, and (2) if the divided data are biased towards the unlabeled samples, there are few labeled samples, causing the network to overfit to the labels and leading to a poor generalization performance. To solve these problems, we propose the curriculum regularization and adaptive semi-supervised learning (CRAS) method. Its key ideas are (1) to train the network with robust regularization techniques as a warm-up before dividing the data, and (2) to control the strength of the regularization using loss weights that adaptively respond to data bias, which varies with each split at each training epoch. We evaluated the performance of CRAS on benchmark image classification datasets, CIFAR-10 and CIFAR-100, and real-world datasets, mini-WebVision and Clothing1M. The findings demonstrate that CRAS excels in handling noisy labels, resulting in a superior generalization and robustness to a range of noise rates, compared with the existing method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3585 KiB  
Article
Enhancement of GUI Display Error Detection Using Improved Faster R-CNN and Multi-Scale Attention Mechanism
by Xi Pan, Zhan Huan, Yimang Li and Yingying Cao
Appl. Sci. 2024, 14(3), 1144; https://doi.org/10.3390/app14031144 - 30 Jan 2024
Viewed by 745
Abstract
Graphical user interfaces (GUIs) hold an irreplaceable position in modern software and applications. Users can interact through them. Due to different terminal devices, there are sometimes display errors, such as component occlusion, image loss, text overlap, and empty values during software rendering. To [...] Read more.
Graphical user interfaces (GUIs) hold an irreplaceable position in modern software and applications. Users can interact through them. Due to different terminal devices, there are sometimes display errors, such as component occlusion, image loss, text overlap, and empty values during software rendering. To address the aforementioned common four GUI display errors, a target detection algorithm based on the improved Faster R-CNN is proposed. Specifically, ResNet-50 is used instead of the traditional VGG-16 as the feature extraction network. The feature pyramid network (FPN) and the enhanced multi-scale attention (EMA) algorithm are introduced to improve accuracy. ROI-Align is used instead of ROI-Pooling to enhance the generalization capability of the network. Since training models require a large number of labeled screenshots of errors, there is currently no publicly available dataset with GUI display problems. Therefore, a training data generation algorithm has been developed, which can automatically generate screenshots with GUI display problems based on the Rico dataset. Experimental results show that the improved Faster R-CNN achieves a detection accuracy of 87.3% in the generated GUI problem dataset, which is a 7% improvement compared to the previous version. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3370 KiB  
Article
Deep Learning-Based Technique for Remote Sensing Image Enhancement Using Multiscale Feature Fusion
by Ming Zhao, Rui Yang, Min Hu and Botao Liu
Sensors 2024, 24(2), 673; https://doi.org/10.3390/s24020673 - 21 Jan 2024
Viewed by 872
Abstract
The present study proposes a novel deep-learning model for remote sensing image enhancement. It maintains image details while enhancing brightness in the feature extraction module. An improved hierarchical model named Global Spatial Attention Network (GSA-Net), based on U-Net for image enhancement, is proposed [...] Read more.
The present study proposes a novel deep-learning model for remote sensing image enhancement. It maintains image details while enhancing brightness in the feature extraction module. An improved hierarchical model named Global Spatial Attention Network (GSA-Net), based on U-Net for image enhancement, is proposed to improve the model’s performance. To circumvent the issue of insufficient sample data, gamma correction is applied to create low-light images, which are then used as training examples. A loss function is constructed using the Structural Similarity (SSIM) and Peak Signal-to-Noise Ratio (PSNR) indices. The GSA-Net network and loss function are utilized to restore images obtained via low-light remote sensing. This proposed method was tested on the Northwestern Polytechnical University Very-High-Resolution 10 (NWPU VHR-10) dataset, and its overall superiority was demonstrated in comparison with other state-of-the-art algorithms using various objective assessment indicators, such as PSNR, SSIM, and Learned Perceptual Image Patch Similarity (LPIPS). Furthermore, in high-level visual tasks such as object detection, this novel method provides better remote sensing images with distinct details and higher contrast than the competing methods. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 20196 KiB  
Article
Synthetic Document Images with Diverse Shadows for Deep Shadow Removal Networks
by Yuhi Matsuo and Yoshimitsu Aoki
Sensors 2024, 24(2), 654; https://doi.org/10.3390/s24020654 - 19 Jan 2024
Viewed by 756
Abstract
Shadow removal for document images is an essential task for digitized document applications. Recent shadow removal models have been trained on pairs of shadow images and shadow-free images. However, obtaining a large, diverse dataset for document shadow removal takes time and effort. Thus, [...] Read more.
Shadow removal for document images is an essential task for digitized document applications. Recent shadow removal models have been trained on pairs of shadow images and shadow-free images. However, obtaining a large, diverse dataset for document shadow removal takes time and effort. Thus, only small real datasets are available. Graphic renderers have been used to synthesize shadows to create relatively large datasets. However, the limited number of unique documents and the limited lighting environments adversely affect the network performance. This paper presents a large-scale, diverse dataset called the Synthetic Document with Diverse Shadows (SynDocDS) dataset. The SynDocDS comprises rendered images with diverse shadows augmented by a physics-based illumination model, which can be utilized to obtain a more robust and high-performance deep shadow removal network. In this paper, we further propose a Dual Shadow Fusion Network (DSFN). Unlike natural images, document images often have constant background colors requiring a high understanding of global color features for training a deep shadow removal network. The DSFN has a high global color comprehension and understanding of shadow regions and merges shadow attentions and features efficiently. We conduct experiments on three publicly available datasets, the OSR, Kligler’s, and Jung’s datasets, to validate our proposed method’s effectiveness. In comparison to training on existing synthetic datasets, our model training on the SynDocDS dataset achieves an enhancement in the PSNR and SSIM, increasing them from 23.00 dB to 25.70 dB and 0.959 to 0.971 on average. In addition, the experiments demonstrated that our DSFN clearly outperformed other networks across multiple metrics, including the PSNR, the SSIM, and its impact on OCR performance. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 11956 KiB  
Article
Decomposition Technique for Bio-Transmittance Imaging Based on Attenuation Coefficient Matrix Inverse
by Purnomo Sidi Priambodo, Toto Aminoto and Basari Basari
J. Imaging 2024, 10(1), 22; https://doi.org/10.3390/jimaging10010022 - 15 Jan 2024
Viewed by 1264
Abstract
Human body tissue disease diagnosis will become more accurate if transmittance images, such as X-ray images, are separated according to each constituent tissue. This research proposes a new image decomposition technique based on the matrix inverse method for biological tissue images. The fundamental [...] Read more.
Human body tissue disease diagnosis will become more accurate if transmittance images, such as X-ray images, are separated according to each constituent tissue. This research proposes a new image decomposition technique based on the matrix inverse method for biological tissue images. The fundamental idea of this research is based on the fact that when k different monochromatic lights penetrate a biological tissue, they will experience different attenuation coefficients. Furthermore, the same happens when monochromatic light penetrates k different biological tissues, as they will also experience different attenuation coefficients. The various attenuation coefficients are arranged into a unique k×k-dimensional square matrix. k-many images taken by k-many different monochromatic lights are then merged into an image vector entity; further, a matrix inverse operation is performed on the merged image, producing N-many tissue thickness images of the constituent tissues. This research demonstrates that the proposed method effectively decomposes images of biological objects into separate images, each showing the thickness distributions of different constituent tissues. In the future, this proposed new technique is expected to contribute to supporting medical imaging analysis. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

28 pages, 22846 KiB  
Article
Predicting Wind Comfort in an Urban Area: A Comparison of a Regression- with a Classification-CNN for General Wind Rose Statistics
by Jennifer Werner, Dimitri Nowak, Franziska Hunger, Tomas Johnson, Andreas Mark, Alexander Gösta and Fredrik Edelvik
Mach. Learn. Knowl. Extr. 2024, 6(1), 98-125; https://doi.org/10.3390/make6010006 - 04 Jan 2024
Cited by 1 | Viewed by 1830
Abstract
Wind comfort is an important factor when new buildings in existing urban areas are planned. It is common practice to use computational fluid dynamics (CFD) simulations to model wind comfort. These simulations are usually time-consuming, making it impossible to explore a high number [...] Read more.
Wind comfort is an important factor when new buildings in existing urban areas are planned. It is common practice to use computational fluid dynamics (CFD) simulations to model wind comfort. These simulations are usually time-consuming, making it impossible to explore a high number of different design choices for a new urban development with wind simulations. Data-driven approaches based on simulations have shown great promise, and have recently been used to predict wind comfort in urban areas. These surrogate models could be used in generative design software and would enable the planner to explore a large number of options for a new design. In this paper, we propose a novel machine learning workflow (MLW) for direct wind comfort prediction. The MLW incorporates a regression and a classification U-Net, trained based on CFD simulations. Furthermore, we present an augmentation strategy focusing on generating more training data independent of the underlying wind statistics needed to calculate the wind comfort criterion. We train the models based on different sets of training data and compare the results. All trained models (regression and classification) yield an F1-score greater than 80% and can be combined with any wind rose statistic. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 1658 KiB  
Article
A Robust Machine Learning Model for Diabetic Retinopathy Classification
by Gigi Tăbăcaru, Simona Moldovanu, Elena Răducan and Marian Barbu
J. Imaging 2024, 10(1), 8; https://doi.org/10.3390/jimaging10010008 - 28 Dec 2023
Cited by 1 | Viewed by 1412
Abstract
Ensemble learning is a process that belongs to the artificial intelligence (AI) field. It helps to choose a robust machine learning (ML) model, usually used for data classification. AI has a large connection with image processing and feature classification, and it can also [...] Read more.
Ensemble learning is a process that belongs to the artificial intelligence (AI) field. It helps to choose a robust machine learning (ML) model, usually used for data classification. AI has a large connection with image processing and feature classification, and it can also be successfully applied to analyzing fundus eye images. Diabetic retinopathy (DR) is a disease that can cause vision loss and blindness, which, from an imaging point of view, can be shown when screening the eyes. Image processing tools can analyze and extract the features from fundus eye images, and these corroborate with ML classifiers that can perform their classification among different disease classes. The outcomes integrated into automated diagnostic systems can be a real success for physicians and patients. In this study, in the form image processing area, the manipulation of the contrast with the gamma correction parameter was applied because DR affects the blood vessels, and the structure of the eyes becomes disorderly. Therefore, the analysis of the texture with two types of entropies was necessary. Shannon and fuzzy entropies and contrast manipulation led to ten original features used in the classification process. The machine learning library PyCaret performs complex tasks, and the empirical process shows that of the fifteen classifiers, the gradient boosting classifier (GBC) provides the best results. Indeed, the proposed model can classify the DR degrees as normal or severe, achieving an accuracy of 0.929, an F1 score of 0.902, and an area under the curve (AUC) of 0.941. The validation of the selected model with a bootstrap statistical technique was performed. The novelty of the study consists of the extraction of features from preprocessed fundus eye images, their classification, and the manipulation of the contrast in a controlled way. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 3275 KiB  
Article
A Dual-Tree–Complex Wavelet Transform-Based Infrared and Visible Image Fusion Technique and Its Application in Tunnel Crack Detection
by Feng Wang and Tielin Chen
Appl. Sci. 2024, 14(1), 114; https://doi.org/10.3390/app14010114 - 22 Dec 2023
Viewed by 542
Abstract
Computer vision methods have been widely used in recent years for the detection of structural cracks. To address the issues of poor image quality and the inadequate performance of semantic segmentation networks under low-light conditions in tunnels, in this paper, infrared images are [...] Read more.
Computer vision methods have been widely used in recent years for the detection of structural cracks. To address the issues of poor image quality and the inadequate performance of semantic segmentation networks under low-light conditions in tunnels, in this paper, infrared images are used, and a preprocessing method based on image fusion technology is developed. First, the DAISY descriptor and the perspective transform are applied for image alignment. Then, the source image is decomposed into high- and low-frequency components of different scales and directions using DT-CWT, and high- and low-frequency subband fusion rules are designed according to the characteristics of infrared and visible images. Finally, a fused image is reconstructed from the processed coefficients, and the fusion results are evaluated using the improved semantic segmentation network. The results show that using the proposed fusion method to preprocess images leads to a low false alarm rate and low missed detection rate in comparison to those using the source image directly or using the classical fusion algorithm. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 2628 KiB  
Article
High-Precision Carton Detection Based on Adaptive Image Augmentation for Unmanned Cargo Handling Tasks
by Bing Liang, Xin Wang, Wenhao Zhao and Xiaobang Wang
Sensors 2024, 24(1), 12; https://doi.org/10.3390/s24010012 - 19 Dec 2023
Viewed by 653
Abstract
Unattended intelligent cargo handling is an important means to improve the efficiency and safety of port cargo trans-shipment, where high-precision carton detection is an unquestioned prerequisite. Therefore, this paper introduces an adaptive image augmentation method for high-precision carton detection. First, the imaging parameters [...] Read more.
Unattended intelligent cargo handling is an important means to improve the efficiency and safety of port cargo trans-shipment, where high-precision carton detection is an unquestioned prerequisite. Therefore, this paper introduces an adaptive image augmentation method for high-precision carton detection. First, the imaging parameters of the images are clustered into various scenarios, and the imaging parameters and perspectives are adaptively adjusted to achieve the automatic augmenting and balancing of the carton dataset in each scenario, which reduces the interference of the scenarios on the carton detection precision. Then, the carton boundary features are extracted and stochastically sampled to synthesize new images, thus enhancing the detection performance of the trained model for dense cargo boundaries. Moreover, the weight function of the hyperparameters of the trained model is constructed to achieve their preferential crossover during genetic evolution to ensure the training efficiency of the augmented dataset. Finally, an intelligent cargo handling platform is developed and field experiments are conducted. The outcomes of the experiments reveal that the method attains a detection precision of 0.828. This technique significantly enhances the detection precision by 18.1% and 4.4% when compared to the baseline and other methods, which provides a reliable guarantee for intelligent cargo handling processes. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 10650 KiB  
Article
Zero-Shot Traffic Sign Recognition Based on Midlevel Feature Matching
by Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa and Miki Haseyama
Sensors 2023, 23(23), 9607; https://doi.org/10.3390/s23239607 - 04 Dec 2023
Viewed by 1116
Abstract
Traffic sign recognition is a complex and challenging yet popular problem that can assist drivers on the road and reduce traffic accidents. Most existing methods for traffic sign recognition use convolutional neural networks (CNNs) and can achieve high recognition accuracy. However, these methods [...] Read more.
Traffic sign recognition is a complex and challenging yet popular problem that can assist drivers on the road and reduce traffic accidents. Most existing methods for traffic sign recognition use convolutional neural networks (CNNs) and can achieve high recognition accuracy. However, these methods first require a large number of carefully crafted traffic sign datasets for the training process. Moreover, since traffic signs differ in each country and there is a variety of traffic signs, these methods need to be fine-tuned when recognizing new traffic sign categories. To address these issues, we propose a traffic sign matching method for zero-shot recognition. Our proposed method can perform traffic sign recognition without training data by directly matching the similarity of target and template traffic sign images. Our method uses the midlevel features of CNNs to obtain robust feature representations of traffic signs without additional training or fine-tuning. We discovered that midlevel features improve the accuracy of zero-shot traffic sign recognition. The proposed method achieves promising recognition results on the German Traffic Sign Recognition Benchmark open dataset and a real-world dataset taken from Sapporo City, Japan. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

23 pages, 6299 KiB  
Article
Two-Stage Pedestrian Detection Model Using a New Classification Head for Domain Generalization
by Daniel Schulz and Claudio A. Perez
Sensors 2023, 23(23), 9380; https://doi.org/10.3390/s23239380 - 24 Nov 2023
Cited by 1 | Viewed by 906
Abstract
Pedestrian detection based on deep learning methods have reached great success in the past few years with several possible real-world applications including autonomous driving, robotic navigation, and video surveillance. In this work, a new neural network two-stage pedestrian detector with a new custom [...] Read more.
Pedestrian detection based on deep learning methods have reached great success in the past few years with several possible real-world applications including autonomous driving, robotic navigation, and video surveillance. In this work, a new neural network two-stage pedestrian detector with a new custom classification head, adding the triplet loss function to the standard bounding box regression and classification losses, is presented. This aims to improve the domain generalization capabilities of existing pedestrian detectors, by explicitly maximizing inter-class distance and minimizing intra-class distance. Triplet loss is applied to the features generated by the region proposal network, aimed at clustering together pedestrian samples in the features space. We used Faster R-CNN and Cascade R-CNN with the HRNet backbone pre-trained on ImageNet, changing the standard classification head for Faster R-CNN, and changing one of the three heads for Cascade R-CNN. The best results were obtained using a progressive training pipeline, starting from a dataset that is further away from the target domain, and progressively fine-tuning on datasets closer to the target domain. We obtained state-of-the-art results, MR2 of 9.9, 11.0, and 36.2 for the reasonable, small, and heavy subsets on the CityPersons benchmark with outstanding performance on the heavy subset, the most difficult one. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 47409 KiB  
Article
Ore Rock Fragmentation Calculation Based on Multi-Modal Fusion of Point Clouds and Images
by Jianjun Peng, Yunhao Cui, Zhidan Zhong and Yi An
Appl. Sci. 2023, 13(23), 12558; https://doi.org/10.3390/app132312558 - 21 Nov 2023
Viewed by 671
Abstract
The accurate calculation of ore rock fragmentation is important for achieving the autonomous mining operation of mine excavators. However, a single mode cannot accurately calculate the ore rock fragmentation due to the low resolution of the point cloud mode and the lack of [...] Read more.
The accurate calculation of ore rock fragmentation is important for achieving the autonomous mining operation of mine excavators. However, a single mode cannot accurately calculate the ore rock fragmentation due to the low resolution of the point cloud mode and the lack of spatial position information of the image mode. To solve this problem, we propose an ore rock fragmentation calculation method (ORFCM) based on the multi-modal fusion of point clouds and images. The ORFCM makes full use of the advantages of multi-modal data, including the fine-grained object segmentation of images and spatial location information of point clouds. To solve the problem of image under-segmentation, we propose a multiscale adaptive edge-detection method based on an innovative standard deviation map to enhance the weak edges. Furthermore, an improved marked watershed segmentation algorithm is proposed to solve the problem of low segmentation accuracy caused by excessive noise of the gradient map and weak edges submerged. Experiments demonstrate that ORFCM can accurately calculate ore rock fragmentation in the local excavation area without relying on external markers for pixel calibration. The average error of the equivalent diameter of ore rock blocks is 0.66 cm, the average error of the elliptical long diameter is 1.42 cm, and the average error of the elliptical short diameter is 1.06 cm, which can effectively meet practical engineering needs. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 867 KiB  
Article
A Four-Stage Mahalanobis-Distance-Based Method for Hand Posture Recognition
by Dawid Warchoł and Tomasz Kapuściński
Appl. Sci. 2023, 13(22), 12347; https://doi.org/10.3390/app132212347 - 15 Nov 2023
Viewed by 550
Abstract
Automatic recognition of hand postures is an important research topic with many applications, e.g., communication support for deaf people. In this paper, we present a novel four-stage, Mahalanobis-distance-based method for hand posture recognition using skeletal data. The proposed method is based on a [...] Read more.
Automatic recognition of hand postures is an important research topic with many applications, e.g., communication support for deaf people. In this paper, we present a novel four-stage, Mahalanobis-distance-based method for hand posture recognition using skeletal data. The proposed method is based on a two-stage classification algorithm with two additional stages related to joint preprocessing (normalization) and a rule-based system, specific to hand shapes that the algorithm is meant to classify. The method achieves superior effectiveness on two benchmark datasets, the first of which was created by us for the purpose of this work, while the second is a well-known and publicly available dataset. The method’s recognition rate measured by leave-one-subject-out cross-validation tests is 94.69% on the first dataset and 97.44% on the second. Experiments, including comparison with other state-of-the-art methods and ablation studies related to classification accuracy and time, confirm the effectiveness of our approach. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

23 pages, 4942 KiB  
Article
Anomaly Detection in the Production Process of Stamping Progressive Dies Using the Shape- and Size-Adaptive Descriptors
by Liang Ma and Fanwu Meng
Sensors 2023, 23(21), 8904; https://doi.org/10.3390/s23218904 - 01 Nov 2023
Viewed by 942
Abstract
In the production process of progressive die stamping, anomaly detection is essential for ensuring the safety of expensive dies and the continuous stability of the production process. Early monitoring processes involve manually inspecting the quality of post-production products to infer whether there are [...] Read more.
In the production process of progressive die stamping, anomaly detection is essential for ensuring the safety of expensive dies and the continuous stability of the production process. Early monitoring processes involve manually inspecting the quality of post-production products to infer whether there are anomalies in the production process, or using some sensors to monitor some state signals during the production process. However, the former is an extremely tedious and time-consuming task, and the latter cannot provide warnings before anomalies occur. Both methods can only detect anomalies after they have occurred, which usually means that damage to the die has already been caused. In this paper, we propose a machine-vision-based method for real-time anomaly detection in the production of progressive die stamping. This method can detect anomalies before they cause actual damage to the mold, thereby stopping the machine and protecting the mold and machine. In the proposed method, a whole continuous motion scene cycle is decomposed into a standard background template library, and the potential anomaly regions in the image to be detected are determined according to the difference from the background template library. Finally, the shape- and size-adaptive descriptors of these regions and corresponding reference regions are extracted and compared to determine the actual anomaly regions. The experimental results indicate that this method can achieve reasonable accuracy in the detection of anomalies in the production process of stamping progressive dies. The experimental results demonstrate that this method not only achieves satisfactory accuracy in anomaly detection during the production of progressive die stamping, but also attains competitive performance levels when compared with methods based on deep learning. Furthermore, it requires simpler preliminary preparations and does not necessitate the adoption of the deep learning paradigm. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

12 pages, 3017 KiB  
Article
Target Localization and Grasping of Parallel Robots with Multi-Vision Based on Improved RANSAC Algorithm
by Ruizhen Gao, Yang Li, Zhiqiang Liu and Shuai Zhang
Appl. Sci. 2023, 13(20), 11302; https://doi.org/10.3390/app132011302 - 14 Oct 2023
Viewed by 890
Abstract
Some traditional robots are based on offline programming reciprocal motion, and with the continuous upgrades in vision technology, more and more tasks are being replaced with machine vision. At present, the main method of target recognition used in palletizers is the traditional SURF [...] Read more.
Some traditional robots are based on offline programming reciprocal motion, and with the continuous upgrades in vision technology, more and more tasks are being replaced with machine vision. At present, the main method of target recognition used in palletizers is the traditional SURF algorithm, but this method of grasping leads to low accuracy due to the influence of too many mis-matched points. Due to the accuracy of robot target localization with binocular-based vision being low, an improved random sampling consistency algorithm for performing complete parallel robot target localization and grasping under the guidance of multi-vision is proposed. Firstly, the improved RANSAC algorithm, based on the SURF algorithm, was created based on the SURF algorithm; next, the parallax gradient method was applied to iterate the matched point pairs several times to further optimize the data; then, the 3D reconstruction was completed using the improved algorithm via the program technique; finally, the obtained data were input into the robot arm, and the camera’s internal and external parameters were obtained using the calibration method so that the robot could accurately locate and grasp objects. The experiments show that the improved algorithm shows better recognition accuracy and grasping success with the multi-vision approach. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

20 pages, 65655 KiB  
Article
A Spatially Guided Machine-Learning Method to Classify and Quantify Glomerular Patterns of Injury in Histology Images
by Justinas Besusparis, Mindaugas Morkunas and Arvydas Laurinavicius
J. Imaging 2023, 9(10), 220; https://doi.org/10.3390/jimaging9100220 - 11 Oct 2023
Viewed by 1172
Abstract
Introduction The diagnosis of glomerular diseases is primarily based on visual assessment of histologic patterns. Semi-quantitative scoring of active and chronic lesions is often required to assess individual characteristics of the disease. Reproducibility of the visual scoring systems remains debatable, while digital and [...] Read more.
Introduction The diagnosis of glomerular diseases is primarily based on visual assessment of histologic patterns. Semi-quantitative scoring of active and chronic lesions is often required to assess individual characteristics of the disease. Reproducibility of the visual scoring systems remains debatable, while digital and machine-learning technologies present opportunities to detect, classify and quantify glomerular lesions, also considering their inter- and intraglomerular heterogeneity. Materials and methods: We performed a cross-validated comparison of three modifications of a convolutional neural network (CNN)-based approach for recognition and intraglomerular quantification of nine main glomerular patterns of injury. Reference values provided by two nephropathologists were used for validation. For each glomerular image, visual attention heatmaps were generated with a probability of class attribution for further intraglomerular quantification. The quality of classifier-produced heatmaps was evaluated by intersection over union metrics (IoU) between predicted and ground truth localization heatmaps. Results: A proposed spatially guided modification of the CNN classifier achieved the highest glomerular pattern classification accuracies, with area under curve (AUC) values up to 0.981. With regards to heatmap overlap area and intraglomerular pattern quantification, the spatially guided classifier achieved a significantly higher generalized mean IoU value compared to single-multiclass and multiple-binary classifiers. Conclusions: We propose a spatially guided CNN classifier that in our experiments reveals the potential to achieve high accuracy for the localization of intraglomerular patterns. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3617 KiB  
Article
KD-Net: Continuous-Keystroke-Dynamics-Based Human Identification from RGB-D Image Sequences
by Xinxin Dai, Ran Zhao, Pengpeng Hu and Adrian Munteanu
Sensors 2023, 23(20), 8370; https://doi.org/10.3390/s23208370 - 10 Oct 2023
Viewed by 810
Abstract
Keystroke dynamics is a soft biometric based on the assumption that humans always type in uniquely characteristic manners. Previous works mainly focused on analyzing the key press or release events. Unlike these methods, we explored a novel visual modality of keystroke dynamics for [...] Read more.
Keystroke dynamics is a soft biometric based on the assumption that humans always type in uniquely characteristic manners. Previous works mainly focused on analyzing the key press or release events. Unlike these methods, we explored a novel visual modality of keystroke dynamics for human identification using a single RGB-D sensor. In order to verify this idea, we created a dataset dubbed KD-MultiModal, which contains 243.2 K frames of RGB images and depth images, obtained by recording a video of hand typing with a single RGB-D sensor. The dataset comprises RGB-D image sequences of 20 subjects (10 males and 10 females) typing sentences, and each subject typed around 20 sentences. In the task, only the hand and keyboard region contributed to the person identification, so we also propose methods of extracting Regions of Interest (RoIs) for each type of data. Unlike the data of the key press or release, our dataset not only captures the velocity of pressing and releasing different keys and the typing style of specific keys or combinations of keys, but also contains rich information on the hand shape and posture. To verify the validity of our proposed data, we adopted deep neural networks to learn distinguishing features from different data representations, including RGB-KD-Net, D-KD-Net, and RGBD-KD-Net. Simultaneously, the sequence of point clouds also can be obtained from depth images given the intrinsic parameters of the RGB-D sensor, so we also studied the performance of human identification based on the point clouds. Extensive experimental results showed that our idea works and the performance of the proposed method based on RGB-D images is the best, which achieved 99.44% accuracy based on the unseen real-world data. To inspire more researchers and facilitate relevant studies, the proposed dataset will be publicly accessible together with the publication of this paper. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 3822 KiB  
Article
Foreground Segmentation-Based Density Grading Networks for Crowd Counting
by Zelong Liu, Xin Zhou, Tao Zhou and Yuanyuan Chen
Sensors 2023, 23(19), 8177; https://doi.org/10.3390/s23198177 - 29 Sep 2023
Viewed by 651
Abstract
Estimating object counts within a single image or video frame represents a challenging yet pivotal task in the field of computer vision. Its increasing significance arises from its versatile applications across various domains, including public safety and urban planning. Among the various object [...] Read more.
Estimating object counts within a single image or video frame represents a challenging yet pivotal task in the field of computer vision. Its increasing significance arises from its versatile applications across various domains, including public safety and urban planning. Among the various object counting tasks, crowd counting is particularly notable for its critical role in social security and urban planning. However, intricate backgrounds in images often lead to misidentifications, wherein the complex background is mistaken as the foreground, thereby inflating forecasting errors. Additionally, the uneven distribution of crowd density within the foreground further exacerbates predictive errors of the network. This paper introduces a novel architecture with a three-branch structure aimed at synergistically incorporating hierarchical foreground information and global scale information into density map estimation, thereby achieving more precise counting results. Hierarchical foreground information guides the network to perform distinct operations on regions with varying densities, while global scale information evaluates the overall density level of the image and adjusts the model’s global predictions accordingly. We also systematically investigate and compare three potential locations for integrating hierarchical foreground information into the density estimation network, ultimately determining the most effective placement.Through extensive comparative experiments across three datasets, we demonstrate the superior performance of our proposed method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

27 pages, 3661 KiB  
Article
Efficient Extraction of Deep Image Features Using a Convolutional Neural Network (CNN) for Detecting Ventricular Fibrillation and Tachycardia
by Azeddine Mjahad, Mohamed Saban, Hossein Azarmdel and Alfredo Rosado-Muñoz
J. Imaging 2023, 9(9), 190; https://doi.org/10.3390/jimaging9090190 - 18 Sep 2023
Cited by 2 | Viewed by 1405
Abstract
To safely select the proper therapy for ventricular fibrillation (VF), it is essential to distinguish it correctly from ventricular tachycardia (VT) and other rhythms. Provided that the required therapy is not the same, an erroneous detection might [...] Read more.
To safely select the proper therapy for ventricular fibrillation (VF), it is essential to distinguish it correctly from ventricular tachycardia (VT) and other rhythms. Provided that the required therapy is not the same, an erroneous detection might lead to serious injuries to the patient or even cause ventricular fibrillation (VF). The primary innovation of this study lies in employing a CNN to create new features. These features exhibit the capacity and precision to detect and classify cardiac arrhythmias, including VF and VT. The electrocardiographic (ECG) signals utilized for this assessment were sourced from the established MIT-BIH and AHA databases. The input data to be classified are time–frequency (tf) representation images, specifically, Pseudo Wigner–Ville (PWV). Previous to Pseudo Wigner–Ville (PWV) calculation, preprocessing for denoising, signal alignment, and segmentation is necessary. In order to check the validity of the method independently of the classifier, four different CNNs are used: InceptionV3, MobilNet, VGGNet and AlexNet. The classification results reveal the following values: for VF detection, there is a sensitivity (Sens) of 98.16%, a specificity (Spe) of 99.07%, and an accuracy (Acc) of 98.91%; for ventricular tachycardia (VT), the sensitivity is 90.45%, the specificity is 99.73%, and the accuracy is 99.09%; for normal sinus rhythms, sensitivity stands at 99.34%, specificity is 98.35%, and accuracy is 98.89%; finally, for other rhythms, the sensitivity is 96.98%, the specificity is 99.68%, and the accuracy is 99.11%. Furthermore, distinguishing between shockable (VF/VT) and non-shockable rhythms yielded a sensitivity of 99.23%, a specificity of 99.74%, and an accuracy of 99.61%. The results show that using tf representations as a form of image, combined in this case with a CNN classifier, raises the classification performance above the results in previous works. Considering that these results were achieved without the preselection of ECG episodes, it can be concluded that these features may be successfully introduced in Automated External Defibrillation (AED) and Implantable Cardioverter Defibrillation (ICD) therapies, also opening the door to their use in other ECG rhythm detection applications. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 2875 KiB  
Article
Using Different Types of Artificial Neural Networks to Classify 2D Matrix Codes and Their Rotations—A Comparative Study
by Ladislav Karrach and Elena Pivarčiová
J. Imaging 2023, 9(9), 188; https://doi.org/10.3390/jimaging9090188 - 18 Sep 2023
Viewed by 1385
Abstract
Artificial neural networks can solve various tasks in computer vision, such as image classification, object detection, and general recognition. Our comparative study deals with four types of artificial neural networks—multilayer perceptrons, probabilistic neural networks, radial basis function neural networks, and convolutional neural networks—and [...] Read more.
Artificial neural networks can solve various tasks in computer vision, such as image classification, object detection, and general recognition. Our comparative study deals with four types of artificial neural networks—multilayer perceptrons, probabilistic neural networks, radial basis function neural networks, and convolutional neural networks—and investigates their ability to classify 2D matrix codes (Data Matrix codes, QR codes, and Aztec codes) as well as their rotation. The paper presents the basic building blocks of these artificial neural networks and their architecture and compares the classification accuracy of 2D matrix codes under different configurations of these neural networks. A dataset of 3000 synthetic code samples was used to train and test the neural networks. When the neural networks were trained on the full dataset, the convolutional neural network showed its superiority, followed by the RBF neural network and the multilayer perceptron. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 2949 KiB  
Article
Decoding Algorithm of Motor Imagery Electroencephalogram Signal Based on CLRNet Network Model
by Chaozhu Zhang, Hongxing Chu and Mingyuan Ma
Sensors 2023, 23(18), 7694; https://doi.org/10.3390/s23187694 - 06 Sep 2023
Viewed by 834
Abstract
EEG decoding based on motor imagery is an important part of brain–computer interface technology and is an important indicator that determines the overall performance of the brain–computer interface. Due to the complexity of motor imagery EEG feature analysis, traditional classification models rely heavily [...] Read more.
EEG decoding based on motor imagery is an important part of brain–computer interface technology and is an important indicator that determines the overall performance of the brain–computer interface. Due to the complexity of motor imagery EEG feature analysis, traditional classification models rely heavily on the signal preprocessing and feature design stages. End-to-end neural networks in deep learning have been applied to the classification task processing of motor imagery EEG and have shown good results. This study uses a combination of a convolutional neural network (CNN) and a long short-term memory (LSTM) network to obtain spatial information and temporal correlation from EEG signals. The use of cross-layer connectivity reduces the network gradient dispersion problem and enhances the overall network model stability. The effectiveness of this network model is demonstrated on the BCI Competition IV dataset 2a by integrating CNN, BiLSTM and ResNet (called CLRNet in this study) to decode motor imagery EEG. The network model combining CNN and BiLSTM achieved 87.0% accuracy in classifying motor imagery patterns in four classes. The network stability is enhanced by adding ResNet for cross-layer connectivity, which further improved the accuracy by 2.0% to achieve 89.0% classification accuracy. The experimental results show that CLRNet has good performance in decoding the motor imagery EEG dataset. This study provides a better solution for motor imagery EEG decoding in brain–computer interface technology research. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

24 pages, 10442 KiB  
Article
CRABR-Net: A Contextual Relational Attention-Based Recognition Network for Remote Sensing Scene Objective
by Ningbo Guo, Mingyong Jiang, Lijing Gao, Yizhuo Tang, Jinwei Han and Xiangning Chen
Sensors 2023, 23(17), 7514; https://doi.org/10.3390/s23177514 - 29 Aug 2023
Cited by 3 | Viewed by 813
Abstract
Remote sensing scene objective recognition (RSSOR) plays a serious application value in both military and civilian fields. Convolutional neural networks (CNNs) have greatly enhanced the improvement of intelligent objective recognition technology for remote sensing scenes, but most of the methods using CNN for [...] Read more.
Remote sensing scene objective recognition (RSSOR) plays a serious application value in both military and civilian fields. Convolutional neural networks (CNNs) have greatly enhanced the improvement of intelligent objective recognition technology for remote sensing scenes, but most of the methods using CNN for high-resolution RSSOR either use only the feature map of the last layer or directly fuse the feature maps from various layers in the “summation” way, which not only ignores the favorable relationship information between adjacent layers but also leads to redundancy and loss of feature map, which hinders the improvement of recognition accuracy. In this study, a contextual, relational attention-based recognition network (CRABR-Net) was presented, which extracts different convolutional feature maps from CNN, focuses important feature content by using a simple, parameter-free attention module (SimAM), fuses the adjacent feature maps by using the complementary relationship feature map calculation, improves the feature learning ability by using the enhanced relationship feature map calculation, and finally uses the concatenated feature maps from different layers for RSSOR. Experimental results show that CRABR-Net exploits the relationship between the different CNN layers to improve recognition performance, achieves better results compared to several state-of-the-art algorithms, and the average accuracy on AID, UC-Merced, and RSSCN7 can be up to 96.46%, 99.20%, and 95.43% with generic training ratios. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

24 pages, 3972 KiB  
Article
Automatic Facial Aesthetic Prediction Based on Deep Learning with Loss Ensembles
by Jwan Najeeb Saeed, Adnan Mohsin Abdulazeez and Dheyaa Ahmed Ibrahim
Appl. Sci. 2023, 13(17), 9728; https://doi.org/10.3390/app13179728 - 28 Aug 2023
Viewed by 1282
Abstract
Deep data-driven methodologies have significantly enhanced the automatic facial beauty prediction (FBP), particularly convolutional neural networks (CNNs). However, despite its wide utilization in classification-based applications, the adoption of CNN in regression research is still constrained. In addition, biases in beauty scores assigned to [...] Read more.
Deep data-driven methodologies have significantly enhanced the automatic facial beauty prediction (FBP), particularly convolutional neural networks (CNNs). However, despite its wide utilization in classification-based applications, the adoption of CNN in regression research is still constrained. In addition, biases in beauty scores assigned to facial images, such as preferences for specific, ethnicities, or age groups, present challenges to the effective generalization of models, which may not be appropriately addressed within conventional individual loss functions. Furthermore, regression problems commonly employ L2 loss to measure error rate, and this function is sensitive to outliers, making it difficult to generalize depending on the number of outliers in the training phase. Meanwhile, L1 loss is another regression-loss function that penalizes errors linearly and is less sensitive to outliers. The Log-cosh loss function is a flexible and robust loss function for regression problems. It provides a good compromise between the L1 and L2 loss functions. The Ensemble of multiple loss functions has been proven to improve the performance of deep-learning models in various tasks. In this work, we proposed to ensemble three regression-loss functions, namely L1, L2, and Log-cosh, and subsequently averaging them to create a new composite cost function. This strategy capitalizes on the unique traits of each loss function, constructing a unified framework that harmonizes outlier tolerance, precision, and adaptability. The proposed loss function’s effectiveness was demonstrated by incorporating it with three pretrained CNNs (AlexNet, VGG16-Net, and FIAC-Net) and evaluating it based on three FBP benchmarks (SCUT-FBP, SCUT-FBP5500, and MEBeauty). Integrating FIAC-Net with the proposed loss function yields remarkable outcomes across datasets due to its pretrained task of facial-attractiveness classification. The efficacy is evident in managing uncertain noise distributions, resulting in a strong correlation between machine- and human-rated aesthetic scores, along with low error rates. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

28 pages, 21211 KiB  
Article
An Intelligent Sorting Method of Film in Cotton Combining Hyperspectral Imaging and the AlexNet-PCA Algorithm
by Quang Li, Ling Zhao, Xin Yu, Zongbin Liu and Yiqing Zhang
Sensors 2023, 23(16), 7041; https://doi.org/10.3390/s23167041 - 09 Aug 2023
Viewed by 868
Abstract
Long-staple cotton from Xinjiang is renowned for its exceptional quality. However, it is susceptible to contamination with plastic film during mechanical picking. To address the issue of tricky removal of film in seed cotton, a technique based on hyperspectral images and AlexNet-PCA is [...] Read more.
Long-staple cotton from Xinjiang is renowned for its exceptional quality. However, it is susceptible to contamination with plastic film during mechanical picking. To address the issue of tricky removal of film in seed cotton, a technique based on hyperspectral images and AlexNet-PCA is proposed to identify the colorless and transparent film of the seed cotton. The method consists of black and white correction of hyperspectral images, dimensionality reduction of hyperspectral data, and training and testing of convolutional neural network (CNN) models. The key technique is to find the optimal way to reduce the dimensionality of the hyperspectral data, thus reducing the computational cost. The biggest innovation of the paper is the combination of CNNs and dimensionality reduction methods to achieve high-precision intelligent recognition of transparent plastic films. Experiments with three dimensionality reduction methods and three CNN architectures are conducted to seek the optimal model for plastic film recognition. The results demonstrate that AlexNet-PCA-12 achieves the highest recognition accuracy and cost performance in dimensionality reduction. In the practical application sorting tests, the method proposed in this paper achieved a 97.02% removal rate of plastic film, which provides a modern theoretical model and effective method for high-precision identification of heteropolymers in seed cotton. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

21 pages, 1997 KiB  
Article
Shot Boundary Detection with 3D Depthwise Convolutions and Visual Attention
by Miguel Jose Esteve Brotons, Francisco Javier Lucendo, Rodriguez-Juan Javier and Jose Garcia-Rodriguez
Sensors 2023, 23(16), 7022; https://doi.org/10.3390/s23167022 - 08 Aug 2023
Viewed by 840
Abstract
Shot boundary detection is the process of identifying and locating the boundaries between individual shots in a video sequence. A shot is a continuous sequence of frames that are captured by a single camera, without any cuts or edits. Recent investigations have shown [...] Read more.
Shot boundary detection is the process of identifying and locating the boundaries between individual shots in a video sequence. A shot is a continuous sequence of frames that are captured by a single camera, without any cuts or edits. Recent investigations have shown the effectiveness of the use of 3D convolutional networks to solve this task due to its high capacity to extract spatiotemporal features of the video and determine in which frame a transition or shot change occurs. When this task is used as part of a scene segmentation use case with the aim of improving the experience of viewing content from streaming platforms, the speed of segmentation is very important for live and near-live use cases such as start-over. The problem with models based on 3D convolutions is the large number of parameters that they entail. Standard 3D convolutions impose much higher CPU and memory requirements than do the same 2D operations. In this paper, we rely on depthwise separable convolutions to address the problem but with a scheme that significantly reduces the number of parameters. To compensate for the slight loss of performance, we analyze and propose the use of visual self-attention as a mechanism of improvement. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 1536 KiB  
Article
Open-Set Recognition of Wood Species Based on Deep Learning Feature Extraction Using Leaves
by Tianyu Fang, Zhenyu Li, Jialin Zhang, Dawei Qi and Lei Zhang
J. Imaging 2023, 9(8), 154; https://doi.org/10.3390/jimaging9080154 - 30 Jul 2023
Viewed by 1283
Abstract
An open-set recognition scheme for tree leaves based on deep learning feature extraction is presented in this study. Deep learning algorithms are used to extract leaf features for different wood species, and the leaf set of a wood species is divided into two [...] Read more.
An open-set recognition scheme for tree leaves based on deep learning feature extraction is presented in this study. Deep learning algorithms are used to extract leaf features for different wood species, and the leaf set of a wood species is divided into two datasets: the leaf set of a known wood species and the leaf set of an unknown species. The deep learning network (CNN) is trained on the leaves of selected known wood species, and the features of the remaining known wood species and all unknown wood species are extracted using the trained CNN. Then, the single-class classification is performed using the weighted SVDD algorithm to recognize the leaves of known and unknown wood species. The features of leaves recognized as known wood species are fed back to the trained CNN to recognize the leaves of known wood species. The recognition results of a single-class classifier for known and unknown wood species are combined with the recognition results of a multi-class CNN to finally complete the open recognition of wood species. We tested the proposed method on the publicly available Swedish Leaf Dataset, which includes 15 wood species (5 species used as known and 10 species used as unknown). The test results showed that, with F1 scores of 0.7797 and 0.8644, mixed recognition rates of 95.15% and 93.14%, and Kappa coefficients of 0.7674 and 0.8644 under two different data distributions, the proposed method outperformed the state-of-the-art open-set recognition algorithms in all three aspects. And, the more wood species that are known, the better the recognition. This approach can extract effective features from tree leaf images for open-set recognition and achieve wood species recognition without compromising tree material. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 16622 KiB  
Article
Tomato Maturity Detection and Counting Model Based on MHSA-YOLOv8
by Ping Li, Jishu Zheng, Peiyuan Li, Hanwei Long, Mai Li and Lihong Gao
Sensors 2023, 23(15), 6701; https://doi.org/10.3390/s23156701 - 26 Jul 2023
Cited by 6 | Viewed by 3888
Abstract
The online automated maturity grading and counting of tomato fruits has a certain promoting effect on digital supervision of fruit growth status and unmanned precision operations during the planting process. The traditional grading and counting of tomato fruit maturity is mostly done manually, [...] Read more.
The online automated maturity grading and counting of tomato fruits has a certain promoting effect on digital supervision of fruit growth status and unmanned precision operations during the planting process. The traditional grading and counting of tomato fruit maturity is mostly done manually, which is time-consuming and laborious work, and its precision depends on the accuracy of human eye observation. The combination of artificial intelligence and machine vision has to some extent solved this problem. In this work, firstly, a digital camera is used to obtain tomato fruit image datasets, taking into account factors such as occlusion and external light interference. Secondly, based on the tomato maturity grading task requirements, the MHSA attention mechanism is adopted to improve YOLOv8’s backbone to enhance the network’s ability to extract diverse features. The Precision, Recall, F1-score, and mAP50 of the tomato fruit maturity grading model constructed based on MHSA-YOLOv8 were 0.806, 0.807, 0.806, and 0.864, respectively, which improved the performance of the model with a slight increase in model size. Finally, thanks to the excellent performance of MHSA-YOLOv8, the Precision, Recall, F1-score, and mAP50 of the constructed counting models were 0.990, 0.960, 0.975, and 0.916, respectively. The tomato maturity grading and counting model constructed in this study is not only suitable for online detection but also for offline detection, which greatly helps to improve the harvesting and grading efficiency of tomato growers. The main innovations of this study are summarized as follows: (1) a tomato maturity grading and counting dataset collected from actual production scenarios was constructed; (2) considering the complexity of the environment, this study proposes a new object detection method, MHSA-YOLOv8, and constructs tomato maturity grading models and counting models, respectively; (3) the models constructed in this study are not only suitable for online grading and counting but also for offline grading and counting. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

10 pages, 1048 KiB  
Article
Quantitative CT Metrics Associated with Variability in the Diffusion Capacity of the Lung of Post-COVID-19 Patients with Minimal Residual Lung Lesions
by Han Wen, Julio A. Huapaya, Shreya M. Kanth, Junfeng Sun, Brianna P. Matthew, Simone C. Lee, Michael Do, Marcus Y. Chen, Ashkan A. Malayeri and Anthony F. Suffredini
J. Imaging 2023, 9(8), 150; https://doi.org/10.3390/jimaging9080150 - 26 Jul 2023
Cited by 4 | Viewed by 1040
Abstract
(1) Background: A reduction in the diffusion capacity of the lung for carbon monoxide is a prevalent longer-term consequence of COVID-19 infection. In patients who have zero or minimal residual radiological abnormalities in the lungs, it has been debated whether the cause was [...] Read more.
(1) Background: A reduction in the diffusion capacity of the lung for carbon monoxide is a prevalent longer-term consequence of COVID-19 infection. In patients who have zero or minimal residual radiological abnormalities in the lungs, it has been debated whether the cause was mainly due to a reduced alveolar volume or involved diffuse interstitial or vascular abnormalities. (2) Methods: We performed a cross-sectional study of 45 patients with either zero or minimal residual lesions in the lungs (total volume < 7 cc) at two months to one year post COVID-19 infection. There was considerable variability in the diffusion capacity of the lung for carbon monoxide, with 27% of the patients at less than 80% of the predicted reference. We investigated a set of independent variables that may affect the diffusion capacity of the lung, including demographic, pulmonary physiology and CT (computed tomography)-derived variables of vascular volume, parenchymal density and residual lesion volume. (3) Results: The leading three variables that contributed to the variability in the diffusion capacity of the lung for carbon monoxide were the alveolar volume, determined via pulmonary function tests, the blood vessel volume fraction, determined via CT, and the parenchymal radiodensity, also determined via CT. These factors explained 49% of the variance of the diffusion capacity, with p values of 0.031, 0.005 and 0.018, respectively, after adjusting for confounders. A multiple-regression model combining these three variables fit the measured values of the diffusion capacity, with R = 0.70 and p < 0.001. (4) Conclusions: The results are consistent with the notion that in some post-COVID-19 patients, after their pulmonary lesions resolve, diffuse changes in the vascular and parenchymal structures, in addition to a low alveolar volume, could be contributors to a lingering low diffusion capacity. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3403 KiB  
Article
Varroa Destructor Classification Using Legendre–Fourier Moments with Different Color Spaces
by Alicia Noriega-Escamilla, César J. Camacho-Bello, Rosa M. Ortega-Mendoza, José H. Arroyo-Núñez and Lucia Gutiérrez-Lazcano
J. Imaging 2023, 9(7), 144; https://doi.org/10.3390/jimaging9070144 - 14 Jul 2023
Cited by 1 | Viewed by 1540
Abstract
Bees play a critical role in pollination and food production, so their preservation is essential, particularly highlighting the importance of detecting diseases in bees early. The Varroa destructor mite is the primary factor contributing to increased viral infections that can lead to hive [...] Read more.
Bees play a critical role in pollination and food production, so their preservation is essential, particularly highlighting the importance of detecting diseases in bees early. The Varroa destructor mite is the primary factor contributing to increased viral infections that can lead to hive mortality. This study presents an innovative method for identifying Varroa destructors in honey bees using multichannel Legendre–Fourier moments. The descriptors derived from this approach possess distinctive characteristics, such as rotation and scale invariance, and noise resistance, allowing the representation of digital images with minimal descriptors. This characteristic is advantageous when analyzing images of living organisms that are not in a static posture. The proposal evaluates the algorithm’s efficiency using different color models, and to enhance its capacity, a subdivision of the VarroaDataset is used. This enhancement allows the algorithm to process additional information about the color and shape of the bee’s legs, wings, eyes, and mouth. To demonstrate the advantages of our approach, we compare it with other deep learning methods, in semantic segmentation techniques, such as DeepLabV3, and object detection techniques, such as YOLOv5. The results suggest that our proposal offers a promising means for the early detection of the Varroa destructor mite, which could be an essential pillar in the preservation of bees and, therefore, in food production. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

20 pages, 9805 KiB  
Article
Augmented Reality in Maintenance—History and Perspectives
by Ana Malta, Torres Farinha and Mateus Mendes
J. Imaging 2023, 9(7), 142; https://doi.org/10.3390/jimaging9070142 - 10 Jul 2023
Viewed by 1948
Abstract
Augmented Reality (AR) is a technology that allows virtual elements to be superimposed over images of real contexts, whether these are text elements, graphics, or other types of objects. Smart AR glasses are increasingly optimized, and modern ones have features such as Global [...] Read more.
Augmented Reality (AR) is a technology that allows virtual elements to be superimposed over images of real contexts, whether these are text elements, graphics, or other types of objects. Smart AR glasses are increasingly optimized, and modern ones have features such as Global Positioning System (GPS), a microphone, and gesture recognition, among others. These devices allow users to have their hands free to perform tasks while they receive instructions in real time through the glasses. This allows maintenance professionals to carry out interventions more efficiently and in a shorter time than would be necessary without the support of this technology. In the present work, a timeline of important achievements is established, including important findings in object recognition, real-time operation. and integration of technologies for shop floor use. Perspectives on future research and related recommendations are proposed as well. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

11 pages, 2637 KiB  
Article
Fast and Efficient Evaluation of the Mass Composition of Shredded Electrodes from Lithium-Ion Batteries Using 2D Imaging
by Peter Bischoff, Alexandra Kaas, Christiane Schuster, Thomas Härtling and Urs Peuker
J. Imaging 2023, 9(7), 135; https://doi.org/10.3390/jimaging9070135 - 05 Jul 2023
Cited by 2 | Viewed by 1438
Abstract
With the increasing number of electrical devices, especially electric vehicles, the need for efficient recycling processes of electric components is on the rise. Mechanical recycling of lithium-ion batteries includes the comminution of the electrodes and sorting the particle mixtures to achieve the highest [...] Read more.
With the increasing number of electrical devices, especially electric vehicles, the need for efficient recycling processes of electric components is on the rise. Mechanical recycling of lithium-ion batteries includes the comminution of the electrodes and sorting the particle mixtures to achieve the highest possible purities of the individual material components (e.g., copper and aluminum). An important part of recycling is the quantitative determination of the yield and recovery rate, which is required to adapt the processes to different feed materials. Since this is usually done by sorting individual particles manually before determining the mass of each material, we developed a novel method for automating this evaluation process. The method is based on detecting the different material particles in images based on simple thresholding techniques and analyzing the correlation of the area of each material in the field of view to the mass in the previously prepared samples. This can then be applied to further samples to determine their mass composition. Using this automated method, the process is accelerated, the accuracy is improved compared to a human operator, and the cost of the evaluation process is reduced. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 5804 KiB  
Article
MineSDS: A Unified Framework for Small Object Detection and Drivable Area Segmentation for Open-Pit Mining Scenario
by Yong Liu, Cheng Li, Jiade Huang and Ming Gao
Sensors 2023, 23(13), 5977; https://doi.org/10.3390/s23135977 - 27 Jun 2023
Cited by 1 | Viewed by 1252
Abstract
To tackle the challenges posed by dense small objects and fuzzy boundaries on unstructured roads in the mining scenario, we proposed an end-to-end small object detection and drivable area segmentation framework for open-pit mining. We employed a convolutional network backbone as a feature [...] Read more.
To tackle the challenges posed by dense small objects and fuzzy boundaries on unstructured roads in the mining scenario, we proposed an end-to-end small object detection and drivable area segmentation framework for open-pit mining. We employed a convolutional network backbone as a feature extractor for both two tasks, as multi-task learning yielded promising results in autonomous driving perception. To address small object detection, we introduced a lightweight attention module that allowed our network to focus more on the spatial and channel dimensions of small objects without impeding inference time. We also used a convolutional block attention module in the drivable area segmentation subnetwork, which assigned more weight to road boundaries to improve feature mapping capabilities. Furthermore, to improve our network perception accuracy of both tasks, we used weighted summation when designing the loss function. We validated the effectiveness of our approach by testing it on pre-collected mining data which were called Minescape. Our detection results on the Minescape dataset showed 87.8% mAP index, which was 9.3% higher than state-of-the-art algorithms. Our segmentation results surpassed the comparison algorithm by 1 percent in MIoU index. Our experimental results demonstrated that our approach achieves competitive performance. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

23 pages, 29094 KiB  
Article
An Approach for 3D Modeling of the Regular Relief Surface Topography Formed by a Ball Burnishing Process Using 2D Images and Measured Profilograms
by Stoyan Slavov, Lyubomir Si Bao Van, Diyan Dimitrov and Boris Nikolov
Sensors 2023, 23(13), 5801; https://doi.org/10.3390/s23135801 - 21 Jun 2023
Viewed by 961
Abstract
Advanced in the present paper is an innovative approach for three-dimensional modeling of the regular relief topography formed via a ball burnishing process. The proposed methodology involves capturing a greyscale image of and profile measuring the surface topography in two perpendicular directions using [...] Read more.
Advanced in the present paper is an innovative approach for three-dimensional modeling of the regular relief topography formed via a ball burnishing process. The proposed methodology involves capturing a greyscale image of and profile measuring the surface topography in two perpendicular directions using a stylus method. A specially developed algorithm further identifies the best match between the measured profile segment and a row or column from the captured topography image by carrying out a signal correlation assessment based on an appropriate similarity metric. To ensure accurate scaling, the image pixel grey levels are scaled with a factor calculated as being the larger ratio between the ultimate heights of the measured profilograms and the more perfectly matched image row/column. Nine different similarity metrics were tested to determine the best performing model. The developed approach was evaluated for eight distinct types of fully and partially regular reliefs, and the results reveal that the best-scaled 3D topography models are produced for the fully regular reliefs with much greater heights. Following a thorough analysis of the results obtained, at the end of the paper, we draw some conclusions and discuss potential future work. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

20 pages, 2936 KiB  
Article
Analysis of the Asymmetry between Both Eyes in Early Diagnosis of Glaucoma Combining Features Extracted from Retinal Images and OCTs into Classification Models
by Francisco Rodríguez-Robles, Rafael Verdú-Monedero, Rafael Berenguer-Vidal, Juan Morales-Sánchez and Inmaculada Sellés-Navarro
Sensors 2023, 23(10), 4737; https://doi.org/10.3390/s23104737 - 14 May 2023
Viewed by 1393
Abstract
This study aims to analyze the asymmetry between both eyes of the same patient for the early diagnosis of glaucoma. Two imaging modalities, retinal fundus images and optical coherence tomographies (OCTs), have been considered in order to compare their different capabilities for glaucoma [...] Read more.
This study aims to analyze the asymmetry between both eyes of the same patient for the early diagnosis of glaucoma. Two imaging modalities, retinal fundus images and optical coherence tomographies (OCTs), have been considered in order to compare their different capabilities for glaucoma detection. From retinal fundus images, the difference between cup/disc ratio and the width of the optic rim has been extracted. Analogously, the thickness of the retinal nerve fiber layer has been measured in spectral-domain optical coherence tomographies. These measurements have been considered as asymmetry characteristics between eyes in the modeling of decision trees and support vector machines for the classification of healthy and glaucoma patients. The main contribution of this work is indeed the use of different classification models with both imaging modalities to jointly exploit the strengths of each of these modalities for the same diagnostic purpose based on the asymmetry characteristics between the eyes of the patient. The results show that the optimized classification models provide better performance with OCT asymmetry features between both eyes (sensitivity 80.9%, specificity 88.2%, precision 66.7%, accuracy 86.5%) than with those extracted from retinographies, although a linear relationship has been found between certain asymmetry features extracted from both imaging modalities. Therefore, the resulting performance of the models based on asymmetry features proves their ability to differentiate healthy from glaucoma patients using those metrics. Models trained from fundus characteristics are a useful option as a glaucoma screening method in the healthy population, although with lower performance than those trained from the thickness of the peripapillary retinal nerve fiber layer. In both imaging modalities, the asymmetry of morphological characteristics can be used as a glaucoma indicator, as detailed in this work. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 16376 KiB  
Article
Gaze-Dependent Image Re-Ranking Technique for Enhancing Content-Based Image Retrieval
by Yuhu Feng, Keisuke Maeda, Takahiro Ogawa and Miki Haseyama
Appl. Sci. 2023, 13(10), 5948; https://doi.org/10.3390/app13105948 - 11 May 2023
Viewed by 1513
Abstract
Content-based image retrieval (CBIR) aims to find desired images similar to the image input by the user, and it is extensively used in the real world. Conventional CBIR methods do not consider user preferences since they only determine retrieval results by referring to [...] Read more.
Content-based image retrieval (CBIR) aims to find desired images similar to the image input by the user, and it is extensively used in the real world. Conventional CBIR methods do not consider user preferences since they only determine retrieval results by referring to the degree of resemblance or likeness between the query and potential candidate images. Because of the above reason, a “semantic gap” appears, as the model may not accurately understand the potential intention that a user has included in the query image. In this article, we propose a re-ranking method for CBIR that considers a user’s gaze trace as interactive information to help the model predict the user’s inherent attention. The proposed method uses the user’s gaze trace corresponding to the image obtained from the initial retrieval as the user’s preference information. We introduce image captioning to effectively express the relationship between images and gaze information by generating image captions based on the gaze trace. As a result, we can transform the coordinate data into a text format and explicitly express the semantic information of the images. Finally, image retrieval is performed again using the generated gaze-dependent image captions to obtain images that align more accurately with the user’s preferences or interests. The experimental results on an open image dataset with corresponding gaze traces and human-generated descriptions demonstrate the efficacy or efficiency of the proposed method. Our method considers visual information as the user’s feedback to achieve user-oriented image retrieval. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 2165 KiB  
Article
Real-Time Machine Learning-Based Driver Drowsiness Detection Using Visual Features
by Yaman Albadawi, Aneesa AlRedhaei and Maen Takruri
J. Imaging 2023, 9(5), 91; https://doi.org/10.3390/jimaging9050091 - 29 Apr 2023
Cited by 9 | Viewed by 10735
Abstract
Drowsiness-related car accidents continue to have a significant effect on road safety. Many of these accidents can be eliminated by alerting the drivers once they start feeling drowsy. This work presents a non-invasive system for real-time driver drowsiness detection using visual features. These [...] Read more.
Drowsiness-related car accidents continue to have a significant effect on road safety. Many of these accidents can be eliminated by alerting the drivers once they start feeling drowsy. This work presents a non-invasive system for real-time driver drowsiness detection using visual features. These features are extracted from videos obtained from a camera installed on the dashboard. The proposed system uses facial landmarks and face mesh detectors to locate the regions of interest where mouth aspect ratio, eye aspect ratio, and head pose features are extracted and fed to three different classifiers: random forest, sequential neural network, and linear support vector machine classifiers. Evaluations of the proposed system over the National Tsing Hua University driver drowsiness detection dataset showed that it can successfully detect and alarm drowsy drivers with an accuracy up to 99%. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 1669 KiB  
Article
Progressively Hybrid Transformer for Multi-Modal Vehicle Re-Identification
by Wenjie Pan, Linhan Huang, Jianbao Liang, Lan Hong and Jianqing Zhu
Sensors 2023, 23(9), 4206; https://doi.org/10.3390/s23094206 - 23 Apr 2023
Cited by 4 | Viewed by 1777
Abstract
Multi-modal (i.e., visible, near-infrared, and thermal-infrared) vehicle re-identification has good potential to search vehicles of interest in low illumination. However, due to the fact that different modalities have varying imaging characteristics, a proper multi-modal complementary information fusion is crucial to multi-modal vehicle re-identification. [...] Read more.
Multi-modal (i.e., visible, near-infrared, and thermal-infrared) vehicle re-identification has good potential to search vehicles of interest in low illumination. However, due to the fact that different modalities have varying imaging characteristics, a proper multi-modal complementary information fusion is crucial to multi-modal vehicle re-identification. For that, this paper proposes a progressively hybrid transformer (PHT). The PHT method consists of two aspects: random hybrid augmentation (RHA) and a feature hybrid mechanism (FHM). Regarding RHA, an image random cropper and a local region hybrider are designed. The image random cropper simultaneously crops multi-modal images of random positions, random numbers, random sizes, and random aspect ratios to generate local regions. The local region hybrider fuses the cropped regions to let regions of each modal bring local structural characteristics of all modalities, mitigating modal differences at the beginning of feature learning. Regarding the FHM, a modal-specific controller and a modal information embedding are designed to effectively fuse multi-modal information at the feature level. Experimental results show the proposed method wins the state-of-the-art method by a larger 2.7% mAP on RGBNT100 and a larger 6.6% mAP on RGBN300, demonstrating that the proposed method can learn multi-modal complementary information effectively. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

10 pages, 1165 KiB  
Brief Report
Invariant Pattern Recognition with Log-Polar Transform and Dual-Tree Complex Wavelet-Fourier Features
by Guangyi Chen and Adam Krzyzak
Sensors 2023, 23(8), 3842; https://doi.org/10.3390/s23083842 - 09 Apr 2023
Cited by 1 | Viewed by 1186
Abstract
In this paper, we propose a novel method for 2D pattern recognition by extracting features with the log-polar transform, the dual-tree complex wavelet transform (DTCWT), and the 2D fast Fourier transform (FFT2). Our new method is invariant to translation, rotation, and scaling of [...] Read more.
In this paper, we propose a novel method for 2D pattern recognition by extracting features with the log-polar transform, the dual-tree complex wavelet transform (DTCWT), and the 2D fast Fourier transform (FFT2). Our new method is invariant to translation, rotation, and scaling of the input 2D pattern images in a multiresolution way, which is very important for invariant pattern recognition. We know that very low-resolution sub-bands lose important features in the pattern images, and very high-resolution sub-bands contain significant amounts of noise. Therefore, intermediate-resolution sub-bands are good for invariant pattern recognition. Experiments on one printed Chinese character dataset and one 2D aircraft dataset show that our new method is better than two existing methods for a combination of rotation angles, scaling factors, and different noise levels in the input pattern images in most testing cases. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 1674 KiB  
Article
YOLOv5s-CA: A Modified YOLOv5s Network with Coordinate Attention for Underwater Target Detection
by Ge Wen, Shaobao Li, Fucai Liu, Xiaoyuan Luo, Meng-Joo Er, Mufti Mahmud and Tao Wu
Sensors 2023, 23(7), 3367; https://doi.org/10.3390/s23073367 - 23 Mar 2023
Cited by 20 | Viewed by 3622
Abstract
Underwater target detection techniques have been extensively applied to underwater vehicles for marine surveillance, aquaculture, and rescue applications. However, due to complex underwater environments and insufficient training samples, the existing underwater target recognition algorithm accuracy is still unsatisfactory. A long-term effort is essential [...] Read more.
Underwater target detection techniques have been extensively applied to underwater vehicles for marine surveillance, aquaculture, and rescue applications. However, due to complex underwater environments and insufficient training samples, the existing underwater target recognition algorithm accuracy is still unsatisfactory. A long-term effort is essential to improving underwater target detection accuracy. To achieve this goal, in this work, we propose a modified YOLOv5s network, called YOLOv5s-CA network, by embedding a Coordinate Attention (CA) module and a Squeeze-and-Excitation (SE) module, aiming to concentrate more computing power on the target to improve detection accuracy. Based on the existing YOLOv5s network, the number of bottlenecks in the first C3 module was increased from one to three to improve the performance of shallow feature extraction. The CA module was embedded into the C3 modules to improve the attention power focused on the target. The SE layer was added to the output of the C3 modules to strengthen model attention. Experiments on the data of the 2019 China Underwater Robot Competition were conducted, and the results demonstrate that the mean Average Precision (mAP) of the modified YOLOv5s network was increased by 2.4%. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

22 pages, 8315 KiB  
Article
Real-Time Fire Smoke Detection Method Combining a Self-Attention Mechanism and Radial Multi-Scale Feature Connection
by Chuan Jin, Anqi Zheng, Zhaoying Wu and Changqing Tong
Sensors 2023, 23(6), 3358; https://doi.org/10.3390/s23063358 - 22 Mar 2023
Cited by 3 | Viewed by 2210
Abstract
Fire remains a pressing issue that requires urgent attention. Due to its uncontrollable and unpredictable nature, it can easily trigger chain reactions and increase the difficulty of extinguishing, posing a significant threat to people’s lives and property. The effectiveness of traditional photoelectric- or [...] Read more.
Fire remains a pressing issue that requires urgent attention. Due to its uncontrollable and unpredictable nature, it can easily trigger chain reactions and increase the difficulty of extinguishing, posing a significant threat to people’s lives and property. The effectiveness of traditional photoelectric- or ionization-based detectors is inhibited when detecting fire smoke due to the variable shape, characteristics, and scale of the detected objects and the small size of the fire source in the early stages. Additionally, the uneven distribution of fire and smoke and the complexity and variety of the surroundings in which they occur contribute to inconspicuous pixel-level-based feature information, making identification difficult. We propose a real-time fire smoke detection algorithm based on multi-scale feature information and an attention mechanism. Firstly, the feature information layers extracted from the network are fused into a radial connection to enhance the semantic and location information of the features. Secondly, to address the challenge of recognizing harsh fire sources, we designed a permutation self-attention mechanism to concentrate on features in channel and spatial directions to gather contextual information as accurately as possible. Thirdly, we constructed a new feature extraction module to increase the detection efficiency of the network while retaining feature information. Finally, we propose a cross-grid sample matching approach and a weighted decay loss function to handle the issue of imbalanced samples. Our model achieves the best detection results compared to standard detection methods using a handcrafted fire smoke detection dataset, with APval reaching 62.5%, APSval reaching 58.5%, and FPS reaching 113.6. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

21 pages, 13516 KiB  
Article
Real-Time Target Detection System for Animals Based on Self-Attention Improvement and Feature Extraction Optimization
by Mingyu Zhang, Fei Gao, Wuping Yang and Haoran Zhang
Appl. Sci. 2023, 13(6), 3987; https://doi.org/10.3390/app13063987 - 21 Mar 2023
Cited by 5 | Viewed by 2598
Abstract
In this paper, we propose a wildlife detection algorithm based on improved YOLOv5s by combining six real wildlife images of different sizes and forms as datasets. Firstly, we use the RepVGG model to simplify the network structure that integrates the ideas of VGG [...] Read more.
In this paper, we propose a wildlife detection algorithm based on improved YOLOv5s by combining six real wildlife images of different sizes and forms as datasets. Firstly, we use the RepVGG model to simplify the network structure that integrates the ideas of VGG and ResNet. This RepVGG introduces a structural reparameterization approach to ensure model flexibility while reducing the computational effort. This not only enhances the ability of model feature extraction but also speeds up the model computation, further improving the model’s real-time performance. Secondly, we use the sliding window method of the Swin Transformer module to divide the feature map to speed up the convergence of the model and improve the real-time performance of the model. Then, it introduces the C3TR module to segment the feature map, expand the perceptual field of the feature map, solve the problem of backpropagation gradient disappearance and gradient explosion, and enhance the feature extraction and feature fusion ability of the model. Finally, the model is improved by using SimOTA, a positive and negative sample matching strategy, by introducing the cost matrix to obtain the highest accuracy with the minimum cost. The experimental results show that the improved YOLOv5s algorithm proposed in this paper improves mAP by 3.2% and FPS by 11.9 compared with the original YOLOv5s algorithm. In addition, the detection accuracy and detection speed of the improved YOLOv5s model in this paper have obvious advantages in terms of the detection effects of other common target detection algorithms on the animal dataset in this paper, which proves that the improved effectiveness and superiority of the improved YOLOv5s target detection algorithm in animal target detection. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 2573 KiB  
Article
Left Ventricle Detection from Cardiac Magnetic Resonance Relaxometry Images Using Visual Transformer
by Lisa Anita De Santi, Antonella Meloni, Maria Filomena Santarelli, Laura Pistoia, Anna Spasiano, Tommaso Casini, Maria Caterina Putti, Liana Cuccia, Filippo Cademartiri and Vincenzo Positano
Sensors 2023, 23(6), 3321; https://doi.org/10.3390/s23063321 - 21 Mar 2023
Cited by 1 | Viewed by 1804
Abstract
Left Ventricle (LV) detection from Cardiac Magnetic Resonance (CMR) imaging is a fundamental step, preliminary to myocardium segmentation and characterization. This paper focuses on the application of a Visual Transformer (ViT), a novel neural network architecture, to automatically detect LV from CMR relaxometry [...] Read more.
Left Ventricle (LV) detection from Cardiac Magnetic Resonance (CMR) imaging is a fundamental step, preliminary to myocardium segmentation and characterization. This paper focuses on the application of a Visual Transformer (ViT), a novel neural network architecture, to automatically detect LV from CMR relaxometry sequences. We implemented an object detector based on the ViT model to identify LV from CMR multi-echo T2* sequences. We evaluated performances differentiated by slice location according to the American Heart Association model using 5-fold cross-validation and on an independent dataset of CMR T2*, T2, and T1 acquisitions. To the best of our knowledge, this is the first attempt to localize LV from relaxometry sequences and the first application of ViT for LV detection. We collected an Intersection over Union (IoU) index of 0.68 and a Correct Identification Rate (CIR) of blood pool centroid of 0.99, comparable with other state-of-the-art methods. IoU and CIR values were significantly lower in apical slices. No significant differences in performances were assessed on independent T2* dataset (IoU = 0.68, p = 0.405; CIR = 0.94, p = 0.066). Performances were significantly worse on the T2 and T1 independent datasets (T2: IoU = 0.62, CIR = 0.95; T1: IoU = 0.67, CIR = 0.98), but still encouraging considering the different types of acquisition. This study confirms the feasibility of the application of ViT architectures in LV detection and defines a benchmark for relaxometry imaging. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 912 KiB  
Article
Skew Class-Balanced Re-Weighting for Unbiased Scene Graph Generation
by Haeyong Kang and Chang D. Yoo
Mach. Learn. Knowl. Extr. 2023, 5(1), 287-303; https://doi.org/10.3390/make5010018 - 10 Mar 2023
Cited by 2 | Viewed by 2062
Abstract
An unbiased scene graph generation (SGG) algorithm referred to as Skew Class-Balanced Re-Weighting (SCR) is proposed for considering the unbiased predicate prediction caused by the long-tailed distribution. The prior works focus mainly on alleviating the deteriorating performances of the minority predicate predictions, showing [...] Read more.
An unbiased scene graph generation (SGG) algorithm referred to as Skew Class-Balanced Re-Weighting (SCR) is proposed for considering the unbiased predicate prediction caused by the long-tailed distribution. The prior works focus mainly on alleviating the deteriorating performances of the minority predicate predictions, showing drastic dropping recall scores, i.e., losing the majority predicate performances. It has not yet correctly analyzed the trade-off between majority and minority predicate performances in the limited SGG datasets. In this paper, to alleviate the issue, the Skew Class-Balanced Re-Weighting (SCR) loss function is considered for the unbiased SGG models. Leveraged by the skewness of biased predicate predictions, the SCR estimates the target predicate weight coefficient and then re-weights more to the biased predicates for better trading-off between the majority predicates and the minority ones. Extensive experiments conducted on the standard Visual Genome dataset and Open Image V4 and V6 show the performances and generality of the SCR with the traditional SGG models. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 4932 KiB  
Article
Research on Crack Width Measurement Based on Binocular Vision and Improved DeeplabV3+
by Chaoxin Chen and Peng Shen
Appl. Sci. 2023, 13(5), 2752; https://doi.org/10.3390/app13052752 - 21 Feb 2023
Cited by 6 | Viewed by 1976
Abstract
Crack width is the main manifestation of concrete material deterioration. To measure the crack information quickly and conveniently, a non-contact measurement method of concrete planar structure crack based on binocular vision is proposed. Firstly, an improved DeeplabV3+ semantic segmentation model is proposed, which [...] Read more.
Crack width is the main manifestation of concrete material deterioration. To measure the crack information quickly and conveniently, a non-contact measurement method of concrete planar structure crack based on binocular vision is proposed. Firstly, an improved DeeplabV3+ semantic segmentation model is proposed, which uses L-MobileNetV2 as the backbone feature extraction network, adopts IDAM structure to extract high-level semantic information, introduces ECA attention mechanism, and optimizes the loss function of the model to achieve high-precision segmentation of crack areas. Secondly, the plane space coordinate equation of the concrete structure was constructed based on the principle of binocular vision and SIFT feature point matching, and the crack width was calculated by combining the segmented image. Finally, to verify the performance of the above method, a measurement test platform was built. The experimental results show that the RMSE of the crack measurement by using the algorithm is less than 0.2 mm, and the error rate is less than 4%, which has stable accuracy in different measurement angles. It solves the problem of fast and convenient measurement of the crack width of concrete planar structures in an outdoor environment. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 7867 KiB  
Article
Hyperspectral Imaging Sorting of Refurbishment Plasterboard Waste
by Miguel Castro-Díaz, Mohamed Osmani, Sergio Cavalaro, Íñigo Cacho, Iratxe Uria, Paul Needham, Jeremy Thompson, Bill Parker and Tatiana Lovato
Appl. Sci. 2023, 13(4), 2413; https://doi.org/10.3390/app13042413 - 13 Feb 2023
Viewed by 1695
Abstract
Post-consumer plasterboard waste sorting is carried out manually by operators, which is time-consuming and costly. In this work, a laboratory-scale hyperspectral imaging (HSI) system was evaluated for automatic refurbishment plasterboard waste sorting. The HSI system was trained to differentiate between plasterboard (gypsum core [...] Read more.
Post-consumer plasterboard waste sorting is carried out manually by operators, which is time-consuming and costly. In this work, a laboratory-scale hyperspectral imaging (HSI) system was evaluated for automatic refurbishment plasterboard waste sorting. The HSI system was trained to differentiate between plasterboard (gypsum core between two lining papers) and contaminants (e.g., wood, plastics, mortar or ceramics). Segregated plasterboard samples were crushed and sieved to obtain gypsum particles of less than 250 microns, which were characterized through X-ray fluorescence to determine their chemical purity levels. Refurbishment plasterboard waste particles <10 mm in size were not processed with the HSI-based sorting system because the manual processing of these particles at a laboratory scale would have been very time-consuming. Gypsum from refurbishment plasterboard waste particles <10 mm in size contained very small amounts of undesirable chemical impurities for plasterboard manufacturing (chloride, magnesium, sodium, potassium and phosphorus salts), and its chemical purity was similar to that of the gypsum from HSI-sorted plasterboard (96 wt%). The combination of unprocessed refurbishment plasterboard waste <10 mm with HSI-sorted plasterboard ≥10 mm in size led to a plasterboard recovery yield >98 wt%. These findings underpin the potential implementation of an industrial-scale HSI system for plasterboard waste sorting. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 1506 KiB  
Article
Infrared Macrothermoscopy Patterns—A New Category of Dermoscopy
by Flavio Leme Ferrari, Marcos Leal Brioschi, Carlos Dalmaso Neto and Carlos Roberto de Medeiros
J. Imaging 2023, 9(2), 36; https://doi.org/10.3390/jimaging9020036 - 06 Feb 2023
Cited by 2 | Viewed by 1716
Abstract
(1) Background: The authors developed a new non-invasive dermatological infrared macroimaging analysis technique (MacroIR) that evaluates microvascular, inflammatory, and metabolic changes that may be dermoscopy complimentary, by analyzing different skin and mucosal lesions in a combined way—naked eye, polarized light dermatoscopy (PLD), and [...] Read more.
(1) Background: The authors developed a new non-invasive dermatological infrared macroimaging analysis technique (MacroIR) that evaluates microvascular, inflammatory, and metabolic changes that may be dermoscopy complimentary, by analyzing different skin and mucosal lesions in a combined way—naked eye, polarized light dermatoscopy (PLD), and MacroIR—and comparing results; (2) Methods: ten cases were evaluated using a smartphone coupled with a dermatoscope and a macro lens integrated far-infrared transducer into specific software to capture and organize high-resolution images in different electromagnetic spectra, and then analyzed by a dermatologist; (3) Results: It was possible to identify and compare structures found in two dermoscopic forms. Visual anatomical changes were correlated with MacroIR and aided skin surface dermatological analysis, presenting studied area microvascular, inflammatory, and metabolic data. All MacroIR images correlated with PLD, naked eye examination, and histopathological findings; (4) Conclusion: MacroIR and clinic dermatologist concordance rates were comparable for all dermatological conditions in this study. MacroIR imaging is a promising method that can improve dermatological diseases diagnosis. The observations are preliminary and require further evaluation in larger studies. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

25 pages, 14005 KiB  
Article
On Deceiving Malware Classification with Section Injection
by Adeilson Antonio da Silva and Mauricio Pamplona Segundo
Mach. Learn. Knowl. Extr. 2023, 5(1), 144-168; https://doi.org/10.3390/make5010009 - 16 Jan 2023
Cited by 3 | Viewed by 2405
Abstract
We investigate how to modify executable files to deceive malware classification systems. This work’s main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive [...] Read more.
We investigate how to modify executable files to deceive malware classification systems. This work’s main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive method, augmenting the data available for training. It respects the operating system file format to make sure the malware will still execute after our injection and will not change its behavior. We reproduced five state-of-the-art malware classification approaches to evaluate our injection scheme: one based on Global Image Descriptor (GIST) + K-Nearest-Neighbors (KNN), three Convolutional Neural Network (CNN) variations and one Gated CNN. We performed our experiments on a public dataset with 9339 malware samples from 25 different families. Our results show that a mere increase of 7% in the malware size causes an accuracy drop between 25% and 40% for malware family classification. They show that an automatic malware classification system may not be as trustworthy as initially reported in the literature. We also evaluate using modified malware alongside the original ones to increase networks robustness against the mentioned attacks. The results show that a combination of reordering malware sections and injecting random data can improve the overall performance of the classification. All the code is publicly available. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

21 pages, 5736 KiB  
Article
Fuzzy Model for the Automatic Recognition of Human Dendritic Cells
by Marwa Braiki, Kamal Nasreddine, Abdesslam Benzinou and Nolwenn Hymery
J. Imaging 2023, 9(1), 13; https://doi.org/10.3390/jimaging9010013 - 06 Jan 2023
Viewed by 1821
Abstract
Background and objective: Nowadays, foodborne illness is considered one of the most outgrowing diseases in the world, and studies show that its rate increases sharply each year. Foodborne illness is considered a public health problem which is caused by numerous factors, such as [...] Read more.
Background and objective: Nowadays, foodborne illness is considered one of the most outgrowing diseases in the world, and studies show that its rate increases sharply each year. Foodborne illness is considered a public health problem which is caused by numerous factors, such as food intoxications, allergies, intolerances, etc. Mycotoxin is one of the food contaminants which is caused by various species of molds (or fungi), which, in turn, causes intoxications that can be chronic or acute. Thus, even low concentrations of Mycotoxin have a severely harmful impact on human health. It is, therefore, necessary to develop an assessment tool for evaluating their impact on the immune response. Recently, researchers have approved a new method of investigation using human dendritic cells, yet the analysis of the geometric properties of these cells is still visual. Moreover, this type of analysis is subjective, time-consuming, and difficult to perform manually. In this paper, we address the automation of this evaluation using image-processing techniques. Methods: Automatic classification approaches of microscopic dendritic cell images are developed to provide a fast and objective evaluation. The first proposed classifier is based on support vector machines (SVM) and Fisher’s linear discriminant analysis (FLD) method. The FLD–SVM classifier does not provide satisfactory results due to the significant confusion between the inhibited cells on one hand, and the other two cell types (mature and immature) on the other hand. Then, another strategy was suggested to enhance dendritic cell recognition results that are emitted from microscopic images. This strategy is mainly based on fuzzy logic which allows us to consider the uncertainties and inaccuracies of the given data. Results: These proposed methods are tested on a real dataset consisting of 421 images of microscopic dendritic cells, where the fuzzy classification scheme efficiently improved the classification results by successfully classifying 96.77% of the dendritic cells. Conclusions: The fuzzy classification-based tools provide cell maturity and inhibition rates which help biologists evaluate severe health impacts caused by food contaminants. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 3276 KiB  
Article
Synthetic Data Generation for Visual Detection of Flattened PET Bottles
by Vitālijs Feščenko, Jānis Ārents and Roberts Kadiķis
Mach. Learn. Knowl. Extr. 2023, 5(1), 14-28; https://doi.org/10.3390/make5010002 - 29 Dec 2022
Viewed by 2558
Abstract
Polyethylene terephthalate (PET) bottle recycling is a highly automated task; however, manual quality control is required due to inefficiencies of the process. In this paper, we explore automation of the quality control sub-task, namely visual bottle detection, using convolutional neural network (CNN)-based methods [...] Read more.
Polyethylene terephthalate (PET) bottle recycling is a highly automated task; however, manual quality control is required due to inefficiencies of the process. In this paper, we explore automation of the quality control sub-task, namely visual bottle detection, using convolutional neural network (CNN)-based methods and synthetic generation of labelled training data. We propose a synthetic generation pipeline tailored for transparent and crushed PET bottle detection; however, it can also be applied to undeformed bottles if the viewpoint is set from above. We conduct various experiments on CNNs to compare the quality of real and synthetic data, show that synthetic data can reduce the amount of real data required and experiment with the combination of both datasets in multiple ways to obtain the best performance. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

12 pages, 2255 KiB  
Communication
Prediction of Carlson Trophic State Index of Small Inland Water from UAV-Based Multispectral Image Modeling
by Cheng-Yun Lin, Ming-Shiun Tsai, Jeff T. H. Tsai and Chih-Cheng Lu
Appl. Sci. 2023, 13(1), 451; https://doi.org/10.3390/app13010451 - 29 Dec 2022
Cited by 1 | Viewed by 1316
Abstract
This paper demonstrates a predictive method for the spatially explicit and periodic in situ monitoring of surface water quality in a small lake using an unmanned aerial vehicle (UAV), equipped with a multi-spectrometer. According to the reflectance of different substances in different spectral [...] Read more.
This paper demonstrates a predictive method for the spatially explicit and periodic in situ monitoring of surface water quality in a small lake using an unmanned aerial vehicle (UAV), equipped with a multi-spectrometer. According to the reflectance of different substances in different spectral bands, multiple regression analyses are used to determine the models that comprise the most relevant band combinations from the multispectral images for the eutrophication assessment of lake water. The relevant eutrophication parameters, such as chlorophyll a, total phosphorus, transparency and dissolved oxygen, are, thus, evaluated and expressed by these regression models. Our experiments find that the predicted eutrophication parameters from the corresponding regression models may generally exhibit good linear results with the coefficients of determination (R2) ranging from 0.7339 to 0.9406. In addition, the result of Carlson trophic state index (CTSI), determined by the on-site water quality sampling data, is found to be rather consistent with the predicted results using the regression model data proposed in this research. The maximal error in CTSI accuracy is as low as 1.4% and the root mean square error (RMSE) is only 0.6624, which reveals the great potential of low-altitude drones equipped with multispectrometers in real-time monitoring and evaluation of the trophic status of a surface water body in an ecosystem. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 2708 KiB  
Article
Multiscale Cascaded Attention Network for Saliency Detection Based on ResNet
by Muwei Jian, Haodong Jin, Xiangyu Liu and Linsong Zhang
Sensors 2022, 22(24), 9950; https://doi.org/10.3390/s22249950 - 16 Dec 2022
Cited by 4 | Viewed by 2863
Abstract
Saliency detection is a key research topic in the field of computer vision. Humans can be accurately and quickly mesmerized by an area of interest in complex and changing scenes through the visual perception area of the brain. Although existing saliency-detection methods can [...] Read more.
Saliency detection is a key research topic in the field of computer vision. Humans can be accurately and quickly mesmerized by an area of interest in complex and changing scenes through the visual perception area of the brain. Although existing saliency-detection methods can achieve competent performance, they have deficiencies such as unclear margins of salient objects and the interference of background information on the saliency map. In this study, to improve the defects during saliency detection, a multiscale cascaded attention network was designed based on ResNet34. Different from the typical U-shaped encoding–decoding architecture, we devised a contextual feature extraction module to enhance the advanced semantic feature extraction. Specifically, a multiscale cascade block (MCB) and a lightweight channel attention (CA) module were added between the encoding and decoding networks for optimization. To address the blur edge issue, which is neglected by many previous approaches, we adopted the edge thinning module to carry out a deeper edge-thinning process on the output layer image. The experimental results illustrate that this method can achieve competitive saliency-detection performance, and the accuracy and recall rate are improved compared with those of other representative methods. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

24 pages, 10274 KiB  
Article
Analysis of Edge Method Accuracy and Practical Multidirectional Modulation Transfer Function Measurement
by Yongjie Wu, Wei Xu, Yongjie Piao and Wei Yue
Appl. Sci. 2022, 12(24), 12748; https://doi.org/10.3390/app122412748 - 12 Dec 2022
Cited by 2 | Viewed by 2533
Abstract
The modulation transfer function (MTF) is commonly used as an imaging quality criterion reflecting the spatial resolution capability of imaging systems. The modified edge methods based on ISO Standard 12233 are widely used in MTF measurement for various imaging fields with high confidence. [...] Read more.
The modulation transfer function (MTF) is commonly used as an imaging quality criterion reflecting the spatial resolution capability of imaging systems. The modified edge methods based on ISO Standard 12233 are widely used in MTF measurement for various imaging fields with high confidence. However, there are two problems in the existing edge methods which limit the application in remote sensing (RS) field with complicated image quality and usually uncontrollable edge angle: a near-horizontal or near-vertical “small tilt angle straight (STAS)” edge is required, and the MTF measurement results show low robustness and non-uniqueness. In this study, the influence of edge angle, oversampling rate (OSR), region of interest (ROI), edge contrast, and random noise on the edge method accuracy is quantitatively analyzed, and a practical multidirectional MTF measurement edge method is proposed based on the above analysis results. The modified edge method adaptively determines the optimal OSR according to edge angle and combines multiple measurement states, such as multi-ROI extraction and multi-phase binning, to improve the robustness, accuracy, and practicality of the edge method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

26 pages, 13008 KiB  
Article
MAGNet: A Camouflaged Object Detection Network Simulating the Observation Effect of a Magnifier
by Xinhao Jiang, Wei Cai, Zhili Zhang, Bo Jiang, Zhiyong Yang and Xin Wang
Entropy 2022, 24(12), 1804; https://doi.org/10.3390/e24121804 - 09 Dec 2022
Cited by 4 | Viewed by 3227
Abstract
In recent years, protecting important objects by simulating animal camouflage has been widely employed in many fields. Therefore, camouflaged object detection (COD) technology has emerged. COD is more difficult to achieve than traditional object detection techniques due to the high degree of fusion [...] Read more.
In recent years, protecting important objects by simulating animal camouflage has been widely employed in many fields. Therefore, camouflaged object detection (COD) technology has emerged. COD is more difficult to achieve than traditional object detection techniques due to the high degree of fusion of objects camouflaged with the background. In this paper, we strive to more accurately and efficiently identify camouflaged objects. Inspired by the use of magnifiers to search for hidden objects in pictures, we propose a COD network that simulates the observation effect of a magnifier called the MAGnifier Network (MAGNet). Specifically, our MAGNet contains two parallel modules: the ergodic magnification module (EMM) and the attention focus module (AFM). The EMM is designed to mimic the process of a magnifier enlarging an image, and AFM is used to simulate the observation process in which human attention is highly focused on a particular region. The two sets of output camouflaged object maps were merged to simulate the observation of an object by a magnifier. In addition, a weighted key point area perception loss function, which is more applicable to COD, was designed based on two modules to give greater attention to the camouflaged object. Extensive experiments demonstrate that compared with 19 cutting-edge detection models, MAGNet can achieve the best comprehensive effect on eight evaluation metrics in the public COD dataset. Additionally, compared to other COD methods, MAGNet has lower computational complexity and faster segmentation. We also validated the model’s generalization ability on a military camouflaged object dataset constructed in-house. Finally, we experimentally explored some extended applications of COD. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 36903 KiB  
Article
Real-Time Video Synopsis via Dynamic and Adaptive Online Tube Resizing
by Xiaoxin Liao, Song Liu and Zemin Cai
Sensors 2022, 22(23), 9046; https://doi.org/10.3390/s22239046 - 22 Nov 2022
Viewed by 1227
Abstract
Nowadays, with the increased numbers of video cameras, the amount of recorded video is growing. Efficient video browsing and retrieval are critical issues when considering the amount of raw video data to be condensed. Activity-based video synopsis is a popular approach to solving [...] Read more.
Nowadays, with the increased numbers of video cameras, the amount of recorded video is growing. Efficient video browsing and retrieval are critical issues when considering the amount of raw video data to be condensed. Activity-based video synopsis is a popular approach to solving the video condensation problem. However, conventional synopsis methods always consists of complicated and pairwise energy terms that involve a time-consuming optimization problem. In this paper, we propose a simple online video synopsis framework in which the number of collisions of objects is classified first. Different optimization strategies are applied according to different collision situations to maintain a balance among the computational cost, condensation ratio, and collision cost. Secondly, tube-resizing coefficients that are dynamic in different frames are adaptively assigned to a newly generated tube. Therefore, a suitable mapping result can be obtained in order to represent the proper size of the activity in each frame of the synopsis video. The maximum number of activities can be displayed in one frame with minimal collisions. Finally, in order to remove motion anti-facts and improve the visual quality of the condensed video, a smooth term is introduced to constrain the resizing coefficients. Experimental results on extensive videos validate the efficiency of the proposed method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

24 pages, 7503 KiB  
Article
Recognition of Continuous Face Occlusion Based on Block Permutation by Using Linear Regression Classification
by Jianxia Xue, Xiaojing Chen, Zhonghao Xie, Shujat Ali, Leiming Yuan, Xi Chen, Wen Shi and Guangzao Huang
Appl. Sci. 2022, 12(23), 11885; https://doi.org/10.3390/app122311885 - 22 Nov 2022
Viewed by 1292
Abstract
Face occlusion is still a key issue in the study of face recognition. Continuous occlusion affects the overall features and contour structure of a face, which brings significant challenges to face recognition. In previous studies, although the Representation-Based Classification Method (RBCM) can better [...] Read more.
Face occlusion is still a key issue in the study of face recognition. Continuous occlusion affects the overall features and contour structure of a face, which brings significant challenges to face recognition. In previous studies, although the Representation-Based Classification Method (RBCM) can better capture the differences in different categories of faces and accurately identify human face images with changes in light and facial expressions, it is easily affected by continuous occlusion. For face recognition, there is a situation where face error recognition occurs. The RBCM method frequently learns to cover the characteristics of face recognition and then handle face error recognition. Therefore, the elimination of occlusion information from the image is necessary to improve the robustness of such models. The Block Permutation Linear Regression Classification (BPLRC) method proposed in this paper includes image block permutation and Linear Regression Classification (LRC). The LRC algorithm belongs to the category of nearest subspace classification and uses the Euclidean distance as a metric to classify images. The LRC algorithm is based on one of the classification methods that is susceptible to outliers. Therefore, block permutation was used with the aim of establishing an image set that does not contain much occlusion information and constructing a robust linear regression model. The BPLRC method first modulates all the images and then lists the schemes that arrange all segments, enters the image features of various schemes into linear models, and classifies the result according to the minimum residual of the person’s face image and reconstruction image. Compared to several state-of-the-art algorithms, the proposed method effectively solves the continuous occlusion problem for the Extended Yale B, ORL, and AR datasets. The proposed method recognizes the AR data concentration scarf to cover the accuracy of human face images to 93.67%. The dataset recognition speed is 0.094 s/piece. The arrangement method can be combined not only with the LRC algorithm, but also other algorithms with weak robustness. Due to the increase in the number of blocks and the increase in the calculation index of block arrangement methods, it is necessary to explore reasonable iteration methods in the future, quickly find the optimal or sub-best arrangement scheme, and reduce the calculation of the proposed method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 2867 KiB  
Article
Pattern Classification Using Quantized Neural Networks for FPGA-Based Low-Power IoT Devices
by Manas Ranjan Biswal, Tahesin Samira Delwar, Abrar Siddique, Prangyadarsini Behera, Yeji Choi and Jee-Youl Ryu
Sensors 2022, 22(22), 8694; https://doi.org/10.3390/s22228694 - 10 Nov 2022
Cited by 1 | Viewed by 2032
Abstract
With the recent growth of the Internet of Things (IoT) and the demand for faster computation, quantized neural networks (QNNs) or QNN-enabled IoT can offer better performance than conventional convolution neural networks (CNNs). With the aim of reducing memory access costs and increasing [...] Read more.
With the recent growth of the Internet of Things (IoT) and the demand for faster computation, quantized neural networks (QNNs) or QNN-enabled IoT can offer better performance than conventional convolution neural networks (CNNs). With the aim of reducing memory access costs and increasing the computation efficiency, QNN-enabled devices are expected to transform numerous industrial applications with lower processing latency and power consumption. Another form of QNN is the binarized neural network (BNN), which has 2 bits of quantized levels. In this paper, CNN-, QNN-, and BNN-based pattern recognition techniques are implemented and analyzed on an FPGA. The FPGA hardware acts as an IoT device due to connectivity with the cloud, and QNN and BNN are considered to offer better performance in terms of low power and low resource use on hardware platforms. The CNN and QNN implementation and their comparative analysis are analyzed based on their accuracy, weight bit error, RoC curve, and execution speed. The paper also discusses various approaches that can be deployed for optimizing various CNN and QNN models with additionally available tools. The work is performed on the Xilinx Zynq 7020 series Pynq Z2 board, which serves as our FPGA-based low-power IoT device. The MNIST and CIFAR-10 databases are considered for simulation and experimentation. The work shows that the accuracy is 95.5% and 79.22% for the MNIST and CIFAR-10 databases, respectively, for full precision (32-bit), and the execution time is 5.8 ms and 18 ms for the MNIST and CIFAR-10 databases, respectively, for full precision (32-bit). Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 1270 KiB  
Article
Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery
by Yufen Xu, Shangbo Zhou and Yuhui Huang
Entropy 2022, 24(11), 1619; https://doi.org/10.3390/e24111619 - 06 Nov 2022
Cited by 3 | Viewed by 1958
Abstract
Convolutional neural networks have long dominated semantic segmentation of very-high-resolution (VHR) remote sensing (RS) images. However, restricted by the fixed receptive field of convolution operation, convolution-based models cannot directly obtain contextual information. Meanwhile, Swin Transformer possesses great potential in modeling long-range dependencies. Nevertheless, [...] Read more.
Convolutional neural networks have long dominated semantic segmentation of very-high-resolution (VHR) remote sensing (RS) images. However, restricted by the fixed receptive field of convolution operation, convolution-based models cannot directly obtain contextual information. Meanwhile, Swin Transformer possesses great potential in modeling long-range dependencies. Nevertheless, Swin Transformer breaks images into patches that are single-dimension sequences without considering the position loss problem inside patches. Therefore, Inspired by Swin Transformer and Unet, we propose SUD-Net (Swin transformer-based Unet-like with Dynamic attention pyramid head Network), a new U-shaped architecture composed of Swin Transformer blocks and convolution layers simultaneously through a dual encoder and an upsampling decoder with a Dynamic Attention Pyramid Head (DAPH) attached to the backbone. First, we propose a dual encoder structure combining Swin Transformer blocks and reslayers in reverse order to complement global semantics with detailed representations. Second, aiming at the spatial loss problem inside each patch, we design a Multi-Path Fusion Model (MPFM) with specially devised Patch Attention (PA) to encode position information of patches and adaptively fuse features of different scales through attention mechanisms. Third, a Dynamic Attention Pyramid Head is constructed with deformable convolution to dynamically aggregate effective and important semantic information. SUD-Net achieves exceptional results on ISPRS Potsdam and Vaihingen datasets with 92.51%mF1, 86.4%mIoU, 92.98%OA, 89.49%mF1, 81.26%mIoU, and 90.95%OA, respectively. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

29 pages, 3608 KiB  
Article
Monocular Camera Viewpoint-Invariant Vehicular Traffic Segmentation and Classification Utilizing Small Datasets
by Amr Yousef, Jeff Flora and Khan Iftekharuddin
Sensors 2022, 22(21), 8121; https://doi.org/10.3390/s22218121 - 24 Oct 2022
Cited by 2 | Viewed by 1874
Abstract
The work presented here develops a computer vision framework that is view angle independent for vehicle segmentation and classification from roadway traffic systems installed by the Virginia Department of Transportation (VDOT). An automated technique for extracting a region of interest is discussed to [...] Read more.
The work presented here develops a computer vision framework that is view angle independent for vehicle segmentation and classification from roadway traffic systems installed by the Virginia Department of Transportation (VDOT). An automated technique for extracting a region of interest is discussed to speed up the processing. The VDOT traffic videos are analyzed for vehicle segmentation using an improved robust low-rank matrix decomposition technique. It presents a new and effective thresholding method that improves segmentation accuracy and simultaneously speeds up the segmentation processing. Size and shape physical descriptors from morphological properties and textural features from the Histogram of Oriented Gradients (HOG) are extracted from the segmented traffic. Furthermore, a multi-class support vector machine classifier is employed to categorize different traffic vehicle types, including passenger cars, passenger trucks, motorcycles, buses, and small and large utility trucks. It handles multiple vehicle detections through an iterative k-means clustering over-segmentation process. The proposed algorithm reduced the processed data by an average of 40%. Compared to recent techniques, it showed an average improvement of 15% in segmentation accuracy, and it is 55% faster than the compared segmentation techniques on average. Moreover, a comparative analysis of 23 different deep learning architectures is presented. The resulting algorithm outperformed the compared deep learning algorithms for the quality of vehicle classification accuracy. Furthermore, the timing analysis showed that it could operate in real-time scenarios. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 4168 KiB  
Article
A Hyper-Chaotically Encrypted Robust Digital Image Watermarking Method with Large Capacity Using Compress Sensing on a Hybrid Domain
by Zhen Yang, Qingwei Sun, Yunliang Qi, Shouliang Li and Fengyuan Ren
Entropy 2022, 24(10), 1486; https://doi.org/10.3390/e24101486 - 18 Oct 2022
Cited by 6 | Viewed by 1908
Abstract
The digital watermarking technique is a quite promising technique for both image copyright protection and secure transmission. However, many existing techniques are not as one might have expected for robustness and capacity simultaneously. In this paper, we propose a robust semi-blind image watermarking [...] Read more.
The digital watermarking technique is a quite promising technique for both image copyright protection and secure transmission. However, many existing techniques are not as one might have expected for robustness and capacity simultaneously. In this paper, we propose a robust semi-blind image watermarking scheme with a high capacity. Firstly, we perform a discrete wavelet transformation (DWT) transformation on the carrier image. Then, the watermark images are compressed via a compressive sampling technique for saving storage space. Thirdly, a Combination of One and Two-Dimensional Chaotic Map based on the Tent and Logistic map (TL-COTDCM) is used to scramble the compressed watermark image with high security and dramatically reduce the false positive problem (FPP). Finally, a singular value decomposition (SVD) component is used to embed into the decomposed carrier image to finish the embedding process. With this scheme, eight 256×256 grayscale watermark images are perfectly embedded into a 512×512 carrier image, the capacity of which is eight times over that of the existing watermark techniques on average. The scheme has been tested through several common attacks on high strength, and the experiment results show the superiority of our method via the two most used evaluation indicators, normalized correlation coefficient (NCC) values and the peak signal-to-noise ratio (PSNR). Our method outperforms the state-of-the-art in the aspects of robustness, security, and capacity of digital watermarking, which exhibits great potential in multimedia application in the immediate future. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 1854 KiB  
Article
Decoupled Early Time Series Classification Using Varied-Length Feature Augmentation and Gradient Projection Technique
by Huiling Chen, Ye Zhang, Aosheng Tian, Yi Hou, Chao Ma and Shilin Zhou
Entropy 2022, 24(10), 1477; https://doi.org/10.3390/e24101477 - 17 Oct 2022
Viewed by 1509
Abstract
Early time series classification (ETSC) is crucial for real-world time-sensitive applications. This task aims to classify time series data with least timestamps at the desired accuracy. Early methods used fixed-length time series to train the deep models, and then quit the classification process [...] Read more.
Early time series classification (ETSC) is crucial for real-world time-sensitive applications. This task aims to classify time series data with least timestamps at the desired accuracy. Early methods used fixed-length time series to train the deep models, and then quit the classification process by setting specific exiting rules. However, these methods may not adapt to the length variation of flow data in ETSC. Recent advances have proposed end-to-end frameworks, which leveraged the Recurrent Neural Networks to handle the varied-length problems, and the exiting subnets for early quitting. Unfortunately, the conflict between the classification and early exiting objectives is not fully considered. To handle these problems, we decouple the ETSC task into the varied-length TSC task and the early exiting task. First, to enhance the adaptive capacity of classification subnets to the data length variation, a feature augmentation module based on random length truncation is proposed. Then, to handle the conflict between classification and early exiting, the gradients of these two tasks are projected into a unified direction. Experimental results on 12 public datasets demonstrate the promising performance of our proposed method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 20989 KiB  
Article
A Fast Adaptive Multi-Scale Kernel Correlation Filter Tracker for Rigid Object
by Kaiyuan Zheng, Zhiyong Zhang and Changzhen Qiu
Sensors 2022, 22(20), 7812; https://doi.org/10.3390/s22207812 - 14 Oct 2022
Cited by 1 | Viewed by 1604
Abstract
The efficient and accurate tracking of a target in complex scenes has always been one of the challenges to tackle. At present, the most effective tracking algorithms are basically neural network models based on deep learning. Although such algorithms have high tracking accuracy, [...] Read more.
The efficient and accurate tracking of a target in complex scenes has always been one of the challenges to tackle. At present, the most effective tracking algorithms are basically neural network models based on deep learning. Although such algorithms have high tracking accuracy, the huge number of parameters and computations in the network models makes it difficult for such algorithms to meet the real-time requirements under limited hardware conditions, such as embedded platforms with small size, low power consumption and limited computing power. Tracking algorithms based on a kernel correlation filter are well-known and widely applied because of their high performance and speed, but when the target is in a complex background, it still can not adapt to the target scale change and occlusion, which will lead to template drift. In this paper, a fast multi-scale kernel correlation filter tracker based on adaptive template updating is proposed for common rigid targets. We introduce a simple scale pyramid on the basis of Kernel Correlation Filtering (KCF), which can adapt to the change in target size while ensuring the speed of operation. We propose an adaptive template updater based on the Mean of Cumulative Maximum Response Values (MCMRV) to alleviate the problem of template drift effectively when occlusion occurs. Extensive experiments have demonstrated the effectiveness of our method on various datasets and significantly outperformed other state-of-the-art methods based on a kernel correlation filter. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 5225 KiB  
Article
Instrument Pointer Recognition Scheme Based on Improved CSL Algorithm
by Hailong Liu, Jielin Wang and Bo Ma
Sensors 2022, 22(20), 7800; https://doi.org/10.3390/s22207800 - 14 Oct 2022
Cited by 1 | Viewed by 1399
Abstract
The traditional pointer instrument recognition scheme is implemented in three steps, which is cumbersome and inefficient. So it is difficult to apply to the industrial production of real-time monitoring. Based on the improvement of the CSL coding method and the setting of the [...] Read more.
The traditional pointer instrument recognition scheme is implemented in three steps, which is cumbersome and inefficient. So it is difficult to apply to the industrial production of real-time monitoring. Based on the improvement of the CSL coding method and the setting of the pre-cache mechanism, an intelligent reading recognition technology of the YOLOv5 pointer instrument is proposed in this paper, which realizes the rapid positioning and reading recognition of the pointer instrument. The problem of angle interaction in rotating target detection is eliminated, the complexity of image preprocessing is avoided, and the problems of poor adaptability of Hough detection are solved in this strategy. The experimental results show that compared with the traditional algorithm, the algorithm in this paper can effectively identify the angle of the pointer instrument, has high detection efficiency and strong adaptability, and has broad application prospects. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

22 pages, 11383 KiB  
Article
Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers
by Yi Liu, Xintao Xu, Bajian Xiang, Gang Chen, Guoliang Gong and Huaxiang Lu
Sensors 2022, 22(19), 7577; https://doi.org/10.3390/s22197577 - 06 Oct 2022
Cited by 2 | Viewed by 1540
Abstract
The depth estimation algorithm based on the convolutional neural network has many limitations and defects by constructing matching cost volume to calculate the disparity: using a limited disparity range, the authentic disparity beyond the predetermined range can not be acquired; Besides, the matching [...] Read more.
The depth estimation algorithm based on the convolutional neural network has many limitations and defects by constructing matching cost volume to calculate the disparity: using a limited disparity range, the authentic disparity beyond the predetermined range can not be acquired; Besides, the matching process lacks constraints on occlusion and matching uniqueness; Also, as a local feature extractor, a convolutional neural network lacks the ability of global context information perception. Aiming at the problems in the matching method of constructing matching cost volume, we propose a disparity prediction algorithm based on Transformer, which specifically comprises the Swin-SPP module for feature extraction based on Swin Transformer, Transformer disparity matching network based on self-attention and cross-attention mechanism, and occlusion prediction sub-network. In addition, we propose a double skip connection fully connected layer to solve the problems of gradient vanishing and explosion during the training process for the Transformer model, thus further enhancing inference accuracy. The proposed model in this paper achieved an EPE (Absolute error) of 0.57 and 0.61, and a 3PE (Percentage error greater than 3 px) of 1.74% and 1.56% on KITTI 2012 and KITTI 2015 datasets, respectively, with an inference time of 0.46 s and parameters as low as only 2.6 M, showing great advantages compared with other algorithms in various evaluation metrics. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 6462 KiB  
Article
An Improved Adaptive Median Filtering Algorithm for Radar Image Co-Channel Interference Suppression
by Nuozhou Li, Tong Liu and Hangqi Li
Sensors 2022, 22(19), 7573; https://doi.org/10.3390/s22197573 - 06 Oct 2022
Cited by 1 | Viewed by 1899
Abstract
In order to increase the accuracy of ocean monitoring, this paper proposes an improved adaptive median filtering algorithm based on the tangential interference ratio to better suppress marine radar co-channel interference. To solve the problem that co-channel interference reduces the accuracy of radar [...] Read more.
In order to increase the accuracy of ocean monitoring, this paper proposes an improved adaptive median filtering algorithm based on the tangential interference ratio to better suppress marine radar co-channel interference. To solve the problem that co-channel interference reduces the accuracy of radar images’ parameter extraction, this paper constructs a tangential interference ratio model based on the improved Laplace operator, which is used to describe the ratio of co-channel interference along the antenna rotation direction in the original radar image. Based on the idea of between-class variance, the tangential interference ratio threshold is selected to divide co-channel interference into high-ratio regions and low ones. Moreover, an improved adaptive median filter is used to process regions of high ratio based on the median of sub-windows, while that of low-ratio regions is processed by the adaptive median filter based on the median of current windows. Radar-measured data from Bohai Bay, China are used for algorithm validation and experimental results show that the proposed filtering algorithm performs better than the adaptive median filtering algorithm. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 5488 KiB  
Article
Detection of Pits by Conjugate Lines: An Algorithm for Segmentation of Overlapping and Adhesion Targets in DE-XRT Sorting Images of Coal and Gangue
by Lei He, Shuang Wang and Yongcun Guo
Appl. Sci. 2022, 12(19), 9850; https://doi.org/10.3390/app12199850 - 30 Sep 2022
Cited by 1 | Viewed by 1426
Abstract
In lump coal and gangue separation based on photoelectric technology, the prerequisite of using a dual-energy X-ray to locate and identify coal and gangue is to obtain the independent target area. However, with the increase in the input of the sorting system, the [...] Read more.
In lump coal and gangue separation based on photoelectric technology, the prerequisite of using a dual-energy X-ray to locate and identify coal and gangue is to obtain the independent target area. However, with the increase in the input of the sorting system, the actual collected images had adhesion and overlapping targets. This paper proposes a pit point detection and segmentation algorithm to solve the problem of overlapping and adhesion targets. The adhesion forms are divided into open and closed-loop adhesion (OLA and CLA). Then, an open- and closed-loop crossing algorithm (OLCA and CLCA) is proposed. We used the conjugate lines to detect the pit and judge the position and distance of the pixel point relative to the conjugate lines. Then, we set the constraint of the distance of the pixel point and the relatively straight line position to complete the pit detection. Finally, the minimum distance search method was used to obtain the dividing line corresponding to the pit to complete the image segmentation. The experiment results demonstrate that the segmentation accuracy of the overlapping target was 90.73%, and the acceptable segmentation accuracy was 94.15%. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 1770 KiB  
Article
The Study of the Effectiveness of Advanced Algorithms for Learning Neural Networks Based on FPGA in the Musical Notation Classification Task
by Sławomir Sokół, Dawid Pawuś, Paweł Majewski and Marek Krok
Appl. Sci. 2022, 12(19), 9829; https://doi.org/10.3390/app12199829 - 29 Sep 2022
Cited by 4 | Viewed by 1365
Abstract
The work contains an original comparison of selected algorithms using artificial neural network models, such as RBF neural networks, and classic algorithms, approaches that are based on structured programming in the image identification task. The existing studies exploring methods for the problem of [...] Read more.
The work contains an original comparison of selected algorithms using artificial neural network models, such as RBF neural networks, and classic algorithms, approaches that are based on structured programming in the image identification task. The existing studies exploring methods for the problem of classifying musical notation used in this work are still scarce. The research of neural network based and the classical method of image recognition was carried out on the basis of the effectiveness of recognizing the notes presented on the treble staff. In order to carry out the research, the density of the data distribution was modeled by means of the probabilistic principal component analysis, and a simple regression was performed with the use of a radial neural network. The methods of image acquisition and analysis are presented. The obtained results were successively tested in terms of selected quality criteria. The development of this research may contribute to supporting the learning of musical notation by both beginners and blind people. The further development of the experiments can provide a convenient reading of the musical notation with the help of a classification system. The research is also an introduction of new algorithms to further tests and projects in the field of music notation classification. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 7682 KiB  
Article
An Efficient Retrieval System Framework for Fabrics Based on Fine-Grained Similarity
by Jun Xiang, Ruru Pan and Weidong Gao
Entropy 2022, 24(9), 1319; https://doi.org/10.3390/e24091319 - 19 Sep 2022
Cited by 1 | Viewed by 1637
Abstract
In the context of “double carbon”, as a traditional high energy consumption industry, the textile industry is facing the severe challenges of energy saving and emission reduction. To improve production efficiency in the textile industry, we propose the use of content-based image retrieval [...] Read more.
In the context of “double carbon”, as a traditional high energy consumption industry, the textile industry is facing the severe challenges of energy saving and emission reduction. To improve production efficiency in the textile industry, we propose the use of content-based image retrieval technology to shorten the fabric production cycle. However, fabric retrieval has high requirements for results, which makes it difficult for common retrieval methods to be directly applied to fabric retrieval. This paper presents a novel method for fabric image retrieval. Firstly, we define a fine-grained similarity to measure the similarity between two fabric images. Then, a convolutional neural network with a compact structure and cross-domain connections is designed to narrow the gap between fabric images and similarities. To overcome the problems of probabilistic missing and difficult training in classical hashing, we introduce a variational network module and structural module into the hashing model, which is called DVSH. We employ list-wise learning to perform similarity embedding. The experimental results demonstrate the superiority and efficiency of the proposed hashing model, DVSH. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

27 pages, 5624 KiB  
Article
Supervised Contrastive Learning and Intra-Dataset Adversarial Adaptation for Iris Segmentation
by Zhiyong Zhou, Yuanning Liu, Xiaodong Zhu, Shuai Liu, Shaoqiang Zhang and Yuanfeng Li
Entropy 2022, 24(9), 1276; https://doi.org/10.3390/e24091276 - 10 Sep 2022
Cited by 5 | Viewed by 1807
Abstract
Precise iris segmentation is a very important part of accurate iris recognition. Traditional iris segmentation methods require complex prior knowledge and pre- and post-processing and have limited accuracy under non-ideal conditions. Deep learning approaches outperform traditional methods. However, the limitation of a small [...] Read more.
Precise iris segmentation is a very important part of accurate iris recognition. Traditional iris segmentation methods require complex prior knowledge and pre- and post-processing and have limited accuracy under non-ideal conditions. Deep learning approaches outperform traditional methods. However, the limitation of a small number of labeled datasets degrades their performance drastically because of the difficulty in collecting and labeling irises. Furthermore, previous approaches ignore the large distribution gap within the non-ideal iris dataset due to illumination, motion blur, squinting eyes, etc. To address these issues, we propose a three-stage training strategy. Firstly, supervised contrastive pretraining is proposed to increase intra-class compactness and inter-class separability to obtain a good pixel classifier under a limited amount of data. Secondly, the entire network is fine-tuned using cross-entropy loss. Thirdly, an intra-dataset adversarial adaptation is proposed, which reduces the intra-dataset gap in the non-ideal situation by aligning the distribution of the hard and easy samples at the pixel class level. Our experiments show that our method improved the segmentation performance and achieved the following encouraging results: 0.44%, 1.03%, 0.66%, 0.41%, and 0.37% in the Nice1 and 96.66%, 98.72%, 93.21%, 94.28%, and 97.41% in the F1 for UBIRIS.V2, IITD, MICHE-I, CASIA-D, and CASIA-T. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 399 KiB  
Article
An Enhanced Scheme for Reducing the Complexity of Pointwise Convolutions in CNNs for Image Classification Based on Interleaved Grouped Filters without Divisibility Constraints
by Joao Paulo Schwarz Schuler, Santiago Romani Also, Domenec Puig, Hatem Rashwan and Mohamed Abdel-Nasser
Entropy 2022, 24(9), 1264; https://doi.org/10.3390/e24091264 - 08 Sep 2022
Cited by 3 | Viewed by 2098
Abstract
In image classification with Deep Convolutional Neural Networks (DCNNs), the number of parameters in pointwise convolutions rapidly grows due to the multiplication of the number of filters by the number of input channels that come from the previous layer. Existing studies demonstrated that [...] Read more.
In image classification with Deep Convolutional Neural Networks (DCNNs), the number of parameters in pointwise convolutions rapidly grows due to the multiplication of the number of filters by the number of input channels that come from the previous layer. Existing studies demonstrated that a subnetwork can replace pointwise convolutional layers with significantly fewer parameters and fewer floating-point computations, while maintaining the learning capacity. In this paper, we propose an improved scheme for reducing the complexity of pointwise convolutions in DCNNs for image classification based on interleaved grouped filters without divisibility constraints. The proposed scheme utilizes grouped pointwise convolutions, in which each group processes a fraction of the input channels. It requires a number of channels per group as a hyperparameter Ch. The subnetwork of the proposed scheme contains two consecutive convolutional layers K and L, connected by an interleaving layer in the middle, and summed at the end. The number of groups of filters and filters per group for layers K and L is determined by exact divisions of the original number of input channels and filters by Ch. If the divisions were not exact, the original layer could not be substituted. In this paper, we refine the previous algorithm so that input channels are replicated and groups can have different numbers of filters to cope with non exact divisibility situations. Thus, the proposed scheme further reduces the number of floating-point computations (11%) and trainable parameters (10%) achieved by the previous method. We tested our optimization on an EfficientNet-B0 as a baseline architecture and made classification tests on the CIFAR-10, Colorectal Cancer Histology, and Malaria datasets. For each dataset, our optimization achieves a saving of 76%, 89%, and 91% of the number of trainable parameters of EfficientNet-B0, while keeping its test classification accuracy. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 4752 KiB  
Article
Full-Scale Fire Smoke Root Detection Based on Connected Particles
by Xuhong Feng, Pengle Cheng, Feng Chen and Ying Huang
Sensors 2022, 22(18), 6748; https://doi.org/10.3390/s22186748 - 07 Sep 2022
Cited by 2 | Viewed by 1520
Abstract
Smoke is an early visual phenomenon of forest fires, and the timely detection of smoke is of great significance for early warning systems. However, most existing smoke detection algorithms have varying levels of accuracy over different distances. This paper proposes a new smoke [...] Read more.
Smoke is an early visual phenomenon of forest fires, and the timely detection of smoke is of great significance for early warning systems. However, most existing smoke detection algorithms have varying levels of accuracy over different distances. This paper proposes a new smoke root detection algorithm that integrates the static and dynamic features of smoke and detects the final smoke root based on clustering and the circumcircle. Compared with the existing methods, the newly developed method has a higher accuracy and detection efficiency on the full scale, indicating that the method has a wider range of applications in the quicker detection of smoke in forests and the prevention of potential forest fire spread. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

32 pages, 8421 KiB  
Article
Infrared Target Detection Based on Joint Spatio-Temporal Filtering and L1 Norm Regularization
by Enyong Xu, Anqing Wu, Juliu Li, Huajin Chen, Xiangsuo Fan and Qibai Huang
Sensors 2022, 22(16), 6258; https://doi.org/10.3390/s22166258 - 20 Aug 2022
Cited by 3 | Viewed by 1501
Abstract
Infrared target detection is often disrupted by a complex background, resulting in a high false alarm and low target recognition. This paper proposes a robust principal component decomposition model with joint spatial and temporal filtering and L1 norm regularization to effectively suppress the [...] Read more.
Infrared target detection is often disrupted by a complex background, resulting in a high false alarm and low target recognition. This paper proposes a robust principal component decomposition model with joint spatial and temporal filtering and L1 norm regularization to effectively suppress the complex backgrounds. The model establishes a new anisotropic Gaussian kernel diffusion function, which exploits the difference between the target and the background in the spatial domain to suppress the edge contours. Furthermore, in order to suppress the dynamically changing background, we construct an inversion model that combines temporal domain information and L1 norm regularization to globally constrain the low rank characteristics of the background, and characterize the target sparse component with L1 norm. Finally, the overlapping multiplier method is used for decomposition and reconstruction to complete the target detection.Through relevant experiments, the proposed background modeling method in this paper has a better background suppression effect in different scenes. The average values of the three evaluation indexes, SSIM, BSF and IC, are 0.986, 88.357 and 18.967, respectively. Meanwhile, the proposed detection method obtains a higher detection rate compared with other algorithms under the same false alarm rate. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

11 pages, 1357 KiB  
Article
Identifying Coastal Wetlands Changes Using a High-Resolution Optical Images Feature Hierarchical Selection Method
by Ruijuan Wu and Jing Wang
Appl. Sci. 2022, 12(16), 8297; https://doi.org/10.3390/app12168297 - 19 Aug 2022
Cited by 1 | Viewed by 1111
Abstract
Coastal wetlands are dynamic and fragile ecosystems where complex changes have taken place. As they are affected by environmental changes and human activities, it is of great practical significance to monitor coastal wetlands changes regularly. High-resolution optical data can observe changes in coastal [...] Read more.
Coastal wetlands are dynamic and fragile ecosystems where complex changes have taken place. As they are affected by environmental changes and human activities, it is of great practical significance to monitor coastal wetlands changes regularly. High-resolution optical data can observe changes in coastal wetlands, however, the impact of different optical features on the identification of changes in coastal wetlands is not clear. Simultaneously, the combination of many features could cause the “dimension disaster” problem. In addition, only small amounts of training samples are accessible at pre- or post-changed time. In order to solve the above problems, the feature hierarchical selection method is proposed, taking into account the jumping degree of different image features. The influence of different optical features on wetland classification was analyzed. In addition, a training samples transfer learning strategy was designed for wetland classification, and the classification result at pre- and post-changed times were compared to identify the “from-to” coastal wetlands changes. The southeastern coastal wetlands located in Jiangsu Province were used as a study area, and ZY-3 images in 2013 and 2018 were used to verify the proposed methods. The results show that the feature hierarchical selection method can provide a quantitative reference for optimal subset feature selection. A training samples transfer learning strategy was used to classify post-changed optical data, the overall accuracy of transferred training samples was 91.16%, and it ensures the accuracy requirements for change identification. In the study area, the salt marsh increased mainly from the sea area, because salt marshes expand rapidly throughout coastal areas, and aquaculture ponds increased from the sea area and salt marshes, because of the considerable economic benefits of the aquacultural industry. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 5671 KiB  
Article
Smartphone Camera Identification from Low-Mid Frequency DCT Coefficients of Dark Images
by Adriana Berdich and Bogdan Groza
Entropy 2022, 24(8), 1158; https://doi.org/10.3390/e24081158 - 19 Aug 2022
Cited by 2 | Viewed by 1914
Abstract
Camera sensor identification can have numerous forensics and authentication applications. In this work, we follow an identification methodology for smartphone camera sensors using properties of the Dark Signal Nonuniformity (DSNU) in the collected images. This requires taking dark pictures, which the users can [...] Read more.
Camera sensor identification can have numerous forensics and authentication applications. In this work, we follow an identification methodology for smartphone camera sensors using properties of the Dark Signal Nonuniformity (DSNU) in the collected images. This requires taking dark pictures, which the users can easily do by keeping the phone against their palm, and has already been proposed by various works. From such pictures, we extract low and mid frequency AC coefficients from the DCT (Discrete Cosine Transform) and classify the data with the help of machine learning techniques. Traditional algorithms such as KNN (K-Nearest Neighbor) give reasonable results in the classification, but we obtain the best results with a wide neural network, which, despite its simplicity, surpassed even a more complex network architecture that we tried. Our analysis showed that the blue channel provided the best separation, which is in contrast to previous works that have recommended the green channel for its higher encoding power. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

20 pages, 33926 KiB  
Article
MDS-Net: Multi-Scale Depth Stratification 3D Object Detection from Monocular Images
by Zhouzhen Xie, Yuying Song, Jingxuan Wu, Zecheng Li, Chunyi Song and Zhiwei Xu
Sensors 2022, 22(16), 6197; https://doi.org/10.3390/s22166197 - 18 Aug 2022
Cited by 1 | Viewed by 1836
Abstract
Monocular 3D object detection is very challenging in autonomous driving due to the lack of depth information. This paper proposes a one-stage monocular 3D object detection network (MDS Net), which uses the anchor-free method to detect 3D objects in a per-pixel prediction. Firstly, [...] Read more.
Monocular 3D object detection is very challenging in autonomous driving due to the lack of depth information. This paper proposes a one-stage monocular 3D object detection network (MDS Net), which uses the anchor-free method to detect 3D objects in a per-pixel prediction. Firstly, a novel depth-based stratification structure is developed to improve the network’s ability of depth prediction, which exploits the mathematical relationship between the size and the depth in the image of an object based on the pinhole model. Secondly, a new angle loss function is developed to further improve both the accuracy of the angle prediction and the convergence speed of training. An optimized Soft-NMS is finally applied in the post-processing stage to adjust the confidence score of the candidate boxes. Experiment results on the KITTI benchmark demonstrate that the proposed MDS-Net outperforms the existing monocular 3D detection methods in both tasks of 3D detection and BEV detection while fulfilling real-time requirements. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 3467 KiB  
Article
Diffraction Enhanced Imaging Analysis with Pseudo-Voigt Fit Function
by Deepak Mani, Andreas Kupsch, Bernd R. Müller and Giovanni Bruno
J. Imaging 2022, 8(8), 206; https://doi.org/10.3390/jimaging8080206 - 23 Jul 2022
Cited by 6 | Viewed by 1822
Abstract
Diffraction enhanced imaging (DEI) is an advanced digital radiographic imaging technique employing the refraction of X-rays to contrast internal interfaces. This study aims to qualitatively and quantitatively evaluate images acquired using this technique and to assess how different fitting functions to the typical [...] Read more.
Diffraction enhanced imaging (DEI) is an advanced digital radiographic imaging technique employing the refraction of X-rays to contrast internal interfaces. This study aims to qualitatively and quantitatively evaluate images acquired using this technique and to assess how different fitting functions to the typical rocking curves (RCs) influence the quality of the images. RCs are obtained for every image pixel. This allows the separate determination of the absorption and the refraction properties of the material in a position-sensitive manner. Comparison of various types of fitting functions reveals that the Pseudo-Voigt (PsdV) function is best suited to fit typical RCs. A robust algorithm was developed in the Python programming language, which reliably extracts the physically meaningful information from each pixel of the image. We demonstrate the potential of the algorithm with two specimens: a silicone gel specimen that has well-defined interfaces, and an additively manufactured polycarbonate specimen. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 3788 KiB  
Article
Crosstalk Defect Detection Method Based on Salient Color Channel Frequency Domain Filtering
by Wenqiang Xie, Huaixin Chen, Zhixi Wang, Xing Liu, Biyuan Liu and Lingyu Shuai
Sensors 2022, 22(14), 5426; https://doi.org/10.3390/s22145426 - 20 Jul 2022
Viewed by 2000
Abstract
Display crosstalk defect detection is an important link in the display quality inspection process. We propose a crosstalk defect detection method based on salient color channel frequency domain filtering. Firstly, the salient color channel in RGBY is selected by the maximum relative entropy [...] Read more.
Display crosstalk defect detection is an important link in the display quality inspection process. We propose a crosstalk defect detection method based on salient color channel frequency domain filtering. Firstly, the salient color channel in RGBY is selected by the maximum relative entropy criterion, and the color quaternion matrix of the displayed image is formed with the Lab color space. Secondly, the image color quaternion matrix is converted into the logarithmic spectrum in the frequency domain through the hyper-complex Fourier transform. Finally, Gaussian threshold band-pass filtering and hyper-complex inverse Fourier transform are used to separate the low-contrast defects and background of the display image. The experimental results show that the accuracy of the proposed algorithm reaches 96% for a variety of crosstalk defect detection. Compared with the current advanced defect detection algorithms, the effectiveness of the proposed method for low-contrast crosstalk defect detection is confirmed. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 2863 KiB  
Article
Non-Negative Matrix Factorization Based on Smoothing and Sparse Constraints for Hyperspectral Unmixing
by Xiangxiang Jia and Baofeng Guo
Sensors 2022, 22(14), 5417; https://doi.org/10.3390/s22145417 - 20 Jul 2022
Cited by 3 | Viewed by 1664
Abstract
Hyperspectral unmixing (HU) is a technique for estimating a set of pure source signals (end members) and their proportions (abundances) from each pixel of the hyperspectral image. Non-negative matrix factorization (NMF) can decompose the observation matrix into the product of two non-negative matrices [...] Read more.
Hyperspectral unmixing (HU) is a technique for estimating a set of pure source signals (end members) and their proportions (abundances) from each pixel of the hyperspectral image. Non-negative matrix factorization (NMF) can decompose the observation matrix into the product of two non-negative matrices simultaneously and can be used in HU. Unfortunately, a limitation of many traditional NMF-based methods, i.e., the non-convexity of the objective function, may lead to a sub-optimal solution. Thus, we put forward a new unmixing method based on NMF under smoothing and sparse constraints to obtain a better solution. First, considering the sparseness of the abundance matrix, a weight sparse regularization is introduced into the NMF model to ensure the sparseness of the abundance matrix. Second, according to the similarity prior of the same feature in the adjacent pixels, a Total Variation regularization is further added to the NMF model to improve the smoothness of the abundance map. Finally, the signatures of each end member are modified smoothly in spectral space. Moreover, it is noticed that discontinuities may emerge due to the removal of noisy bands. Therefore, the spectral data are piecewise smooth in spectral space. Then, in this paper, a piecewise smoothness constraint is further applied to each column of the end-member matrix. Experiments are conducted to evaluate the effectiveness of the proposed method based on two different datasets, including a synthetic dataset and the real-life Cuprite dataset, respectively. Experimental results show that the proposed method outperforms several state-of-the-art HU methods. In the Cuprite hyperspectral dataset, the proposed method’s Spectral Angle Distance is 0.1694, compared to the TV-RSNMF method’s 0.1703, L1/2NMF method’s 0.1925, and VCA-FCLS method’s 0.1872. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 6144 KiB  
Article
An Embedded Portable Lightweight Platform for Real-Time Early Smoke Detection
by Bowen Liu, Bingjian Sun, Pengle Cheng and Ying Huang
Sensors 2022, 22(12), 4655; https://doi.org/10.3390/s22124655 - 20 Jun 2022
Cited by 3 | Viewed by 2363
Abstract
The advances in developing more accurate and fast smoke detection algorithms increase the need for computation in smoke detection, which demands the involvement of personal computers or workstations. Better detection results require a more complex network structure of the smoke detection algorithms and [...] Read more.
The advances in developing more accurate and fast smoke detection algorithms increase the need for computation in smoke detection, which demands the involvement of personal computers or workstations. Better detection results require a more complex network structure of the smoke detection algorithms and higher hardware configuration, which disqualify them as lightweight portable smoke detection for high detection efficiency. To solve this challenge, this paper designs a lightweight portable remote smoke front-end perception platform based on the Raspberry Pi under Linux operating system. The platform has four modules including a source video input module, a target detection module, a display module, and an alarm module. The training images from the public data sets will be used to train a cascade classifier characterized by Local Binary Pattern (LBP) using the Adaboost algorithm in OpenCV. Then the classifier will be used to detect the smoke target in the following video stream and the detected results will be dynamically displayed in the display module in real-time. If smoke is detected, warning messages will be sent to users by the alarm module in the platform for real-time monitoring and warning on the scene. Case studies showed that the developed system platform has strong robustness under the test datasets with high detection accuracy. As the designed platform is portable without the involvement of a personal computer and can efficiently detect smoke in real-time, it provides a potential affordable lightweight smoke detection option for forest fire monitoring in practice. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Graphical abstract

17 pages, 3680 KiB  
Article
Micro-Expression-Based Emotion Recognition Using Waterfall Atrous Spatial Pyramid Pooling Networks
by Marzuraikah Mohd Stofa, Mohd Asyraf Zulkifley and Muhammad Ammirrul Atiqi Mohd Zainuri
Sensors 2022, 22(12), 4634; https://doi.org/10.3390/s22124634 - 19 Jun 2022
Cited by 5 | Viewed by 2068
Abstract
Understanding a person’s attitude or sentiment from their facial expressions has long been a straightforward task for humans. Numerous methods and techniques have been used to classify and interpret human emotions that are commonly communicated through facial expressions, with either macro- or micro-expressions. [...] Read more.
Understanding a person’s attitude or sentiment from their facial expressions has long been a straightforward task for humans. Numerous methods and techniques have been used to classify and interpret human emotions that are commonly communicated through facial expressions, with either macro- or micro-expressions. However, performing this task using computer-based techniques or algorithms has been proven to be extremely difficult, whereby it is a time-consuming task to annotate it manually. Compared to macro-expressions, micro-expressions manifest the real emotional cues of a human, which they try to suppress and hide. Different methods and algorithms for recognizing emotions using micro-expressions are examined in this research, and the results are presented in a comparative approach. The proposed technique is based on a multi-scale deep learning approach that aims to extract facial cues of various subjects under various conditions. Then, two popular multi-scale approaches are explored, Spatial Pyramid Pooling (SPP) and Atrous Spatial Pyramid Pooling (ASPP), which are then optimized to suit the purpose of emotion recognition using micro-expression cues. There are four new architectures introduced in this paper based on multi-layer multi-scale convolutional networks using both direct and waterfall network flows. The experimental results show that the ASPP module with waterfall network flow, which we coined as WASPP-Net, outperforms the state-of-the-art benchmark techniques with an accuracy of 80.5%. For future work, a high-resolution approach to multi-scale approaches can be explored to further improve the recognition performance. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 3335 KiB  
Article
Scene Uyghur Text Detection Based on Fine-Grained Feature Representation
by Yiwen Wang, Hornisa Mamat, Xuebin Xu, Alimjan Aysa and Kurban Ubul
Sensors 2022, 22(12), 4372; https://doi.org/10.3390/s22124372 - 09 Jun 2022
Cited by 6 | Viewed by 1845
Abstract
Scene text detection task aims to precisely localize text in natural environments. At present, the application scenarios of text detection topics have gradually shifted from plain document text to more complex natural scenarios. Objects with similar texture and text morphology in the complex [...] Read more.
Scene text detection task aims to precisely localize text in natural environments. At present, the application scenarios of text detection topics have gradually shifted from plain document text to more complex natural scenarios. Objects with similar texture and text morphology in the complex background noise of natural scene images are prone to false recall and difficult to detect multi-scale texts, a multi-directional scene Uyghur text detection model based on fine-grained feature representation and spatial feature fusion is proposed, and feature extraction and feature fusion are improved to enhance the network’s ability to represent multi-scale features. In this method, the multiple groups of 3 × 3 convolutional feature groups that are connected like the hierarchical residual to build a residual network for feature extraction, which captures the feature details and increases the receptive field of the network to adapt to multi-scale text and long glued dimensional font detection and suppress false positives of text-like objects. Secondly, an adaptive multi-level feature map fusion strategy is adopted to overcome the inconsistency of information in multi-scale feature map fusion. The proposed model achieves 93.94% and 84.92% F-measure on the self-built Uyghur dataset and the ICDAR2015 dataset, respectively, which improves the accuracy of Uyghur text detection and suppresses false positives. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 4071 KiB  
Article
CAP-YOLO: Channel Attention Based Pruning YOLO for Coal Mine Real-Time Intelligent Monitoring
by Zhi Xu, Jingzhao Li, Yifan Meng and Xiaoming Zhang
Sensors 2022, 22(12), 4331; https://doi.org/10.3390/s22124331 - 08 Jun 2022
Cited by 10 | Viewed by 3031
Abstract
Real-time coal mine intelligent monitoring for pedestrian identifying and positioning is an important means to ensure safety in production. Traditional object detection models based on neural networks require significant computational and storage resources, which results in difficulty of deploying models on edge devices [...] Read more.
Real-time coal mine intelligent monitoring for pedestrian identifying and positioning is an important means to ensure safety in production. Traditional object detection models based on neural networks require significant computational and storage resources, which results in difficulty of deploying models on edge devices for real-time intelligent monitoring. To address the above problems, CAP-YOLO (Channel Attention based Pruning YOLO) and AEPSM (adaptive image enhancement parameter selection module) are proposed in this paper to achieve real-time intelligent analysis for coal mine surveillance videos. Firstly, DCAM (Deep Channel Attention Module) is proposed to evaluate the importance level of channels in YOLOv3. Secondly, the filters corresponding to the low importance channels are pruned to generate CAP-YOLO, which recovers the accuracy through fine-tuning. Finally, considering the lighting environments are varied in different coal mine fields, AEPSM is proposed to select parameters for CLAHE (Contrast Limited Adaptive Histogram Equalization) under different fields. Experiment results show that the weight size of CAP-YOLO is 8.3× smaller than YOLOv3, but only 7% lower than mAP, and the inference speed of CAP-YOLO is three times faster than that of YOLOv3. On NVIDIA Jetson TX2, CAP-YOLO realizes 31 FPS inference speed. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

22 pages, 7178 KiB  
Article
Multi-Scale Deep Neural Network Based on Dilated Convolution for Spacecraft Image Segmentation
by Yuan Liu, Ming Zhu, Jing Wang, Xiangji Guo, Yifan Yang and Jiarong Wang
Sensors 2022, 22(11), 4222; https://doi.org/10.3390/s22114222 - 01 Jun 2022
Cited by 11 | Viewed by 2811
Abstract
In recent years, image segmentation techniques based on deep learning have achieved many applications in remote sensing, medical, and autonomous driving fields. In space exploration, the segmentation of spacecraft objects by monocular images can support space station on-orbit assembly tasks and space target [...] Read more.
In recent years, image segmentation techniques based on deep learning have achieved many applications in remote sensing, medical, and autonomous driving fields. In space exploration, the segmentation of spacecraft objects by monocular images can support space station on-orbit assembly tasks and space target position and attitude estimation tasks, which has essential research value and broad application prospects. However, there is no segmentation network designed for spacecraft targets. This paper proposes an end-to-end spacecraft image segmentation network using the semantic segmentation network DeepLabv3+ as the basic framework. We develop a multi-scale neural network based on sparse convolution. First, the feature extraction capability is improved by the dilated convolutional network. Second, we introduce the channel attention mechanism into the network to recalibrate the feature responses. Finally, we design a parallel atrous spatial pyramid pooling (ASPP) structure that enhances the contextual information of the network. To verify the effectiveness of the method, we built a spacecraft segmentation dataset on which we conduct experiments on the segmentation algorithm. The experimental results show that the encoder+ attention+ decoder structure proposed in this paper, which focuses on high-level and low-level features, can obtain clear and complete masks of spacecraft targets with high segmentation accuracy. Compared with DeepLabv3+, our method is a significant improvement. We also conduct an ablation study to research the effectiveness of our network framework. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Back to TopTop