Topic Editors

Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, 02071 Albacete, Spain
Department of IT Engineering, Sookmyung Women’s University, Seoul 04310, Republic of Korea

Applied Computer Vision and Pattern Recognition: 2nd Volume

Abstract submission deadline
30 October 2024
Manuscript submission deadline
30 December 2024
Viewed by
78056

Topic Information

Dear Colleagues,

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Computer vision tasks include methods for acquiring digital images (through image sensors), image processing, and image analysis to reach an understanding of digital images. In general, it deals with the extraction of high-dimensional data from the real world in order to produce numerical or symbolic information that a computer can interpret. For interpretation, computer vision is closely related to pattern recognition.

Indeed, pattern recognition is the process of recognizing patterns by using machine learning algorithms. Pattern recognition can be defined as the identification and classification of meaningful patterns of data based on the extraction and comparison of characteristic properties or features of the data. Pattern recognition is a very important area of research and application, underpinning developments in related fields, such as computer vision, image processing, text and document analysis, and neural networks. It is closely related to machine learning and finds applications in rapidly emerging areas, such as biometrics, bioinformatics, multimedia data analysis, and, more recently, data science. Nowdays, a data-driven approach (such as deep learning) is popular to achieve the goal of pattern recognition and classification in many applications.

This Topic, on Applied Computer Vision and Pattern Recognition, invites papers on theoretical and applied issues, including, but not limited to, the following areas:

  • Statistical, structural, and syntactic pattern recognition;
  • Neural networks, machine learning, and deep learning;
  • Computer vision, robot vision, and machine vision;
  • Multimedia systems and multimedia content;
  • Biosignal processing, speech processing, image processing, and video processing;
  • Data mining, information retrieval, big data, and business intelligence.

This Topic will present the results of research describing recent advances in both the computer vision and pattern recognition fields.

Prof. Dr. Antonio Fernández-Caballero
Prof. Dr. Byung-Gyu Kim
Topic Editors

Keywords

  • pattern recognition
  • neural networks, machine learning
  • deep learning, artificial intelligence
  • computer vision
  • multimedia
  • data mining
  • signal processing
  • image processing

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.5 5.3 2011 17.8 Days CHF 2400 Submit
Electronics
electronics
2.6 5.3 2012 16.8 Days CHF 2400 Submit
Machine Learning and Knowledge Extraction
make
4.0 6.3 2019 27.1 Days CHF 1800 Submit
Journal of Imaging
jimaging
2.7 5.9 2015 20.9 Days CHF 1800 Submit
Sensors
sensors
3.4 7.3 2001 16.8 Days CHF 2600 Submit

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (44 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
17 pages, 12919 KiB  
Article
Fast Fault Line Selection Technology of Distribution Network Based on MCECA-CloFormer
by Can Ding, Pengcheng Ma, Changhua Jiang and Fei Wang
Appl. Sci. 2024, 14(18), 8270; https://doi.org/10.3390/app14188270 - 13 Sep 2024
Viewed by 439
Abstract
When a single-phase grounding fault occurs in resonant ground distribution network, the fault characteristics are weak and it is difficult to detect the fault line. Therefore, a fast fault line selection method based on MCECA-CloFormer is proposed in this paper. Firstly, zero-sequence current [...] Read more.
When a single-phase grounding fault occurs in resonant ground distribution network, the fault characteristics are weak and it is difficult to detect the fault line. Therefore, a fast fault line selection method based on MCECA-CloFormer is proposed in this paper. Firstly, zero-sequence current signals were converted into images using the moving average filter method and motif difference field to construct fault data set. Then, the ECA module was modified to MCECA (MultiCNN-ECA) so that it can accept data input from multiple measurement points. Secondly, the lightweight model CloFormer was used in the back end of MCECA module to further perceive the feature map and complete the establishment of the line selection model. Finally, the line selection model was trained, and the information such as model weight was saved. The simulation results demonstrated that the pre-trained MCECA-CloFormer achieved a line selection accuracy of over 98% under 10 dB noise, with a remarkably low single fault processing time of approximately 0.04 s. Moreover, it exhibited suitability for arc high-resistance grounding faults, data-missing cases, neutral-point ungrounded systems, and active distribution networks. In addition, the method was still valid when tested with actual field recording data. Full article
Show Figures

Figure 1

13 pages, 1876 KiB  
Article
Comparative Analysis of Solar Radiation Forecasting Techniques in Zacatecas, Mexico
by Martha Isabel Escalona-Llaguno, Luis Octavio Solís-Sánchez, Celina L. Castañeda-Miranda, Carlos A. Olvera-Olvera, Ma. del Rosario Martinez-Blanco, Héctor A. Guerrero-Osuna, Rodrigo Castañeda-Miranda, Germán Díaz-Flórez and Gerardo Ornelas-Vargas
Appl. Sci. 2024, 14(17), 7449; https://doi.org/10.3390/app14177449 - 23 Aug 2024
Viewed by 493
Abstract
This work explores the prediction of daily Global Horizontal Irradiance (GHI) patterns in the region of Zacatecas, Mexico, using a diverse range of predictive models, encompassing traditional regressors and advanced neural networks like Evolutionary Neural Architecture Search (ENAS), Convolutional Neural Networks (CNN), Recurrent [...] Read more.
This work explores the prediction of daily Global Horizontal Irradiance (GHI) patterns in the region of Zacatecas, Mexico, using a diverse range of predictive models, encompassing traditional regressors and advanced neural networks like Evolutionary Neural Architecture Search (ENAS), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Meta’s Prophet. This work addressing a notable gap in regional research, and aims to democratize access to accurate solar radiation forecasting methodologies. The evaluations carried out using the time series data obtained by Comisión Nacional del Agua (Conagua) covering the period from 2015 to 2018 reveal different performances of the model in different sky conditions, showcasing strengths in forecasting clear and partially cloudy days while encountering challenges with cloudy conditions. Overall, correlation coefficients (r) ranged between 0.55 and 0.72, with Root Mean Square Error % (RMSE %) values spanning from 20.05% to 20.54%, indicating moderate to good predictive accuracy. This study underscores the need for longer datasets to bolster future predictive capabilities. By democratizing access to these predictive tools, this research facilitates informed decision-making in renewable energy planning and sustainable development strategies tailored to the unique environmental dynamics of the region of Zacatecas and comparable regions. Full article
Show Figures

Figure 1

22 pages, 8959 KiB  
Article
Enhanced Detection and Recognition of Road Objects in Infrared Imaging Using Multi-Scale Self-Attention
by Poyi Liu, Yunkang Zhang, Guanlun Guo and Jiale Ding
Sensors 2024, 24(16), 5404; https://doi.org/10.3390/s24165404 - 21 Aug 2024
Viewed by 583
Abstract
In infrared detection scenarios, detecting and recognizing low-contrast and small-sized targets has always been a challenge in the field of computer vision, particularly in complex road traffic environments. Traditional target detection methods usually perform poorly when processing infrared small targets, mainly due to [...] Read more.
In infrared detection scenarios, detecting and recognizing low-contrast and small-sized targets has always been a challenge in the field of computer vision, particularly in complex road traffic environments. Traditional target detection methods usually perform poorly when processing infrared small targets, mainly due to their inability to effectively extract key features and the significant feature loss that occurs during feature transmission. To address these issues, this paper proposes a fast detection and recognition model based on a multi-scale self-attention mechanism, specifically for small road targets in infrared detection scenarios. We first introduce and improve the DyHead structure based on the YOLOv8 algorithm, which employs a multi-head self-attention mechanism to capture target features at various scales and enhance the model’s perception of small targets. Additionally, to prevent information loss during the feature transmission process via the FPN structure in traditional YOLO algorithms, this paper introduces and enhances the Gather-and-Distribute Mechanism. By computing dependencies between features using self-attention, it reallocates attention weights in the feature maps to highlight important features and suppress irrelevant information. These improvements significantly enhance the model’s capability to detect small targets. Moreover, to further increase detection speed, we pruned the network architecture to reduce computational complexity and parameter count, making the model suitable for real-time processing scenarios. Experiments on our self built infrared road traffic dataset (mainly including two types of targets: vehicles and people) show that compared with the baseline, our method achieves a 3.1% improvement in AP and a 2.5% increase in mAP on the VisDrone2019 dataset, showing significant enhancements in both detection accuracy and processing speed for small targets, with improved robustness and adaptability. Full article
Show Figures

Figure 1

32 pages, 2074 KiB  
Article
Symbol Detection in Mechanical Engineering Sketches: Experimental Study on Principle Sketches with Synthetic Data Generation and Deep Learning
by Sebastian Bickel, Stefan Goetz and Sandro Wartzack
Appl. Sci. 2024, 14(14), 6106; https://doi.org/10.3390/app14146106 - 12 Jul 2024
Viewed by 848
Abstract
Digital transformation is omnipresent in our daily lives and its impact is noticeable through new technologies, like smart devices, AI-Chatbots or the changing work environment. This digitalization also takes place in product development, with the integration of many technologies, such as Industry 4.0, [...] Read more.
Digital transformation is omnipresent in our daily lives and its impact is noticeable through new technologies, like smart devices, AI-Chatbots or the changing work environment. This digitalization also takes place in product development, with the integration of many technologies, such as Industry 4.0, digital twins or data-driven methods, to improve the quality of new products and to save time and costs during the development process. Therefore, the use of data-driven methods reusing existing data has great potential. However, data from product design are very diverse and strongly depend on the respective development phase. One of the first few product representations are sketches and drawings, which represent the product in a simplified and condensed way. But, to reuse the data, the existing sketches must be found with an automated approach, allowing the contained information to be utilized. One approach to solve this problem is presented in this paper, with the detection of principle sketches in the early phase of the development process. The aim is to recognize the symbols in these sketches automatically with object detection models. Therefore, existing approaches were analyzed and a new procedure developed, which uses synthetic training data generation. In the next step, a total of six different data generation types were analyzed and tested using six different one- and two-stage detection models. The entire procedure was then evaluated on two unknown test datasets, one focusing on different gearbox variants and a second dataset derived from CAD assemblies. In the last sections the findings are discussed and a procedure with high detection accuracy is determined. Full article
Show Figures

Figure 1

20 pages, 3739 KiB  
Article
Automatic Switching of Electric Locomotive Power in Railway Neutral Sections Using Image Processing
by Christopher Thembinkosi Mcineka, Nelendran Pillay, Kevin Moorgas and Shaveen Maharaj
J. Imaging 2024, 10(6), 142; https://doi.org/10.3390/jimaging10060142 - 11 Jun 2024
Viewed by 1031
Abstract
This article presents a computer vision-based approach to switching electric locomotive power supplies as the vehicle approaches a railway neutral section. Neutral sections are defined as a phase break in which the objective is to separate two single-phase traction supplies on an overhead [...] Read more.
This article presents a computer vision-based approach to switching electric locomotive power supplies as the vehicle approaches a railway neutral section. Neutral sections are defined as a phase break in which the objective is to separate two single-phase traction supplies on an overhead railway supply line. This separation prevents flashovers due to high voltages caused by the locomotives shorting both electrical phases. The typical system of switching traction supplies automatically employs the use of electro-mechanical relays and induction magnets. In this paper, an image classification approach is proposed to replace the conventional electro-mechanical system with two unique visual markers that represent the ‘Open’ and ‘Close’ signals to initiate the transition. When the computer vision model detects either marker, the vacuum circuit breakers inside the electrical locomotive will be triggered to their respective positions depending on the identified image. A Histogram of Oriented Gradient technique was implemented for feature extraction during the training phase and a Linear Support Vector Machine algorithm was trained for the target image classification. For the task of image segmentation, the Circular Hough Transform shape detection algorithm was employed to locate the markers in the captured images and provided cartesian plane coordinates for segmenting the Object of Interest. A signal marker classification accuracy of 94% with 75 objects per second was achieved using a Linear Support Vector Machine during the experimental testing phase. Full article
Show Figures

Figure 1

21 pages, 5602 KiB  
Article
EMR-HRNet: A Multi-Scale Feature Fusion Network for Landslide Segmentation from Remote Sensing Images
by Yuanhang Jin, Xiaosheng Liu and Xiaobin Huang
Sensors 2024, 24(11), 3677; https://doi.org/10.3390/s24113677 - 6 Jun 2024
Viewed by 680
Abstract
Landslides constitute a significant hazard to human life, safety and natural resources. Traditional landslide investigation methods demand considerable human effort and expertise. To address this issue, this study introduces an innovative landslide segmentation framework, EMR-HRNet, aimed at enhancing accuracy. Initially, a novel data [...] Read more.
Landslides constitute a significant hazard to human life, safety and natural resources. Traditional landslide investigation methods demand considerable human effort and expertise. To address this issue, this study introduces an innovative landslide segmentation framework, EMR-HRNet, aimed at enhancing accuracy. Initially, a novel data augmentation technique, CenterRep, is proposed, not only augmenting the training dataset but also enabling the model to more effectively capture the intricate features of landslides. Furthermore, this paper integrates a RefConv and Multi-Dconv Head Transposed Attention (RMA) feature pyramid structure into the HRNet model, augmenting the model’s capacity for semantic recognition and expression at various levels. Last, the incorporation of the Dilated Efficient Multi-Scale Attention (DEMA) block substantially widens the model’s receptive field, bolstering its capability to discern local features. Rigorous evaluations on the Bijie dataset and the Sichuan and surrounding area dataset demonstrate that EMR-HRNet outperforms other advanced semantic segmentation models, achieving mIoU scores of 81.70% and 71.68%, respectively. Additionally, ablation studies conducted across the comprehensive dataset further corroborate the enhancements’ efficacy. The results indicate that EMR-HRNet excels in processing satellite and UAV remote sensing imagery, showcasing its significant potential in multi-source optical remote sensing for landslide segmentation. Full article
Show Figures

Figure 1

24 pages, 21847 KiB  
Article
A Learnable Viewpoint Evolution Method for Accurate Pose Estimation of Complex Assembled Product
by Delong Zhao, Feifei Kong and Fuzhou Du
Appl. Sci. 2024, 14(11), 4405; https://doi.org/10.3390/app14114405 - 22 May 2024
Viewed by 641
Abstract
Balancing adaptability, reliability, and accuracy in vision technology has always been a major bottleneck limiting its application in appearance assurance for complex objects in high-end equipment production. Data-driven deep learning shows robustness to feature diversity but is limited by interpretability and accuracy. The [...] Read more.
Balancing adaptability, reliability, and accuracy in vision technology has always been a major bottleneck limiting its application in appearance assurance for complex objects in high-end equipment production. Data-driven deep learning shows robustness to feature diversity but is limited by interpretability and accuracy. The traditional vision scheme is reliable and can achieve high accuracy, but its adaptability is insufficient. The deeper reason is the lack of appropriate architecture and integration strategies between the learning paradigm and empirical design. To this end, a learnable viewpoint evolution algorithm for high-accuracy pose estimation of complex assembled products under free view is proposed. To alleviate the balance problem of exploration and optimization in estimation, shape-constrained virtual–real matching, evolvable feasible region, and specialized population migration and reproduction strategies are designed. Furthermore, a learnable evolution control mechanism is proposed, which integrates a guided model based on experience and is cyclic-trained with automatically generated effective trajectories to improve the evolution process. Compared to the 1.69°,55.67 mm of the state-of-the-art data-driven method and the 1.28°,77.67 mm of the classic strategy combination, the pose estimation error of complex assembled product in this study is 0.23°,23.71 mm, which proves the effectiveness of the proposed method. Meanwhile, through in-depth exploration, the robustness, parameter sensitivity, and adaptability to the virtual–real appearance variations are sequentially verified. Full article
Show Figures

Figure 1

22 pages, 38737 KiB  
Article
A Computer Vision Framework for Structural Analysis of Hand-Drawn Engineering Sketches
by Isaac Joffe, Yuchen Qian, Mohammad Talebi-Kalaleh and Qipei Mei
Sensors 2024, 24(9), 2923; https://doi.org/10.3390/s24092923 - 3 May 2024
Viewed by 1054
Abstract
Structural engineers are often required to draw two-dimensional engineering sketches for quick structural analysis, either by hand calculation or using analysis software. However, calculation by hand is slow and error-prone, and the manual conversion of a hand-drawn sketch into a virtual model is [...] Read more.
Structural engineers are often required to draw two-dimensional engineering sketches for quick structural analysis, either by hand calculation or using analysis software. However, calculation by hand is slow and error-prone, and the manual conversion of a hand-drawn sketch into a virtual model is tedious and time-consuming. This paper presents a complete and autonomous framework for converting a hand-drawn engineering sketch into an analyzed structural model using a camera and computer vision. In this framework, a computer vision object detection stage initially extracts information about the raw features in the image of the beam diagram. Next, a computer vision number-reading model transcribes any handwritten numerals appearing in the image. Then, feature association models are applied to characterize the relationships among the detected features in order to build a comprehensive structural model. Finally, the structural model generated is analyzed using OpenSees. In the system presented, the object detection model achieves a mean average precision of 99.1%, the number-reading model achieves an accuracy of 99.0%, and the models in the feature association stage achieve accuracies ranging from 95.1% to 99.5%. Overall, the tool analyzes 45.0% of images entirely correctly and the remaining 55.0% of images partially correctly. The proposed framework holds promise for other types of structural sketches, such as trusses and frames. Moreover, it can be a valuable tool for structural engineers that is capable of improving the efficiency, safety, and sustainability of future construction projects. Full article
Show Figures

Figure 1

19 pages, 15195 KiB  
Article
Color and Luminance Separated Enhancement for Low-Light Images with Brightness Guidance
by Feng Zhang, Xinran Liu, Changxin Gao and Nong Sang
Sensors 2024, 24(9), 2711; https://doi.org/10.3390/s24092711 - 24 Apr 2024
Viewed by 861
Abstract
Existing retinex-based low-light image enhancement strategies focus heavily on crafting complex networks for Retinex decomposition but often result in imprecise estimations. To overcome the limitations of previous methods, we introduce a straightforward yet effective strategy for Retinex decomposition, dividing images into colormaps and [...] Read more.
Existing retinex-based low-light image enhancement strategies focus heavily on crafting complex networks for Retinex decomposition but often result in imprecise estimations. To overcome the limitations of previous methods, we introduce a straightforward yet effective strategy for Retinex decomposition, dividing images into colormaps and graymaps as new estimations for reflectance and illumination maps. The enhancement of these maps is separately conducted using a diffusion model for improved restoration. Furthermore, we address the dual challenge of perturbation removal and brightness adjustment in illumination maps by incorporating brightness guidance. This guidance aids in precisely adjusting the brightness while eliminating disturbances, ensuring a more effective enhancement process. Extensive quantitative and qualitative experimental analyses demonstrate that our proposed method improves the performance by approximately 4.4% on the LOL dataset compared to other state-of-the-art diffusion-based methods, while also validating the model’s generalizability across multiple real-world datasets. Full article
Show Figures

Figure 1

18 pages, 9114 KiB  
Article
Study on Gesture Recognition Method with Two-Stream Residual Network Fusing sEMG Signals and Acceleration Signals
by Zhigang Hu, Shen Wang, Cuisi Ou, Aoru Ge and Xiangpan Li
Sensors 2024, 24(9), 2702; https://doi.org/10.3390/s24092702 - 24 Apr 2024
Viewed by 776
Abstract
Currently, surface EMG signals have a wide range of applications in human–computer interaction systems. However, selecting features for gesture recognition models based on traditional machine learning can be challenging and may not yield satisfactory results. Considering the strong nonlinear generalization ability of neural [...] Read more.
Currently, surface EMG signals have a wide range of applications in human–computer interaction systems. However, selecting features for gesture recognition models based on traditional machine learning can be challenging and may not yield satisfactory results. Considering the strong nonlinear generalization ability of neural networks, this paper proposes a two-stream residual network model with an attention mechanism for gesture recognition. One branch processes surface EMG signals, while the other processes hand acceleration signals. Segmented networks are utilized to fully extract the physiological and kinematic features of the hand. To enhance the model’s capacity to learn crucial information, we introduce an attention mechanism after global average pooling. This mechanism strengthens relevant features and weakens irrelevant ones. Finally, the deep features obtained from the two branches of learning are fused to further improve the accuracy of multi-gesture recognition. The experiments conducted on the NinaPro DB2 public dataset resulted in a recognition accuracy of 88.25% for 49 gestures. This demonstrates that our network model can effectively capture gesture features, enhancing accuracy and robustness across various gestures. This approach to multi-source information fusion is expected to provide more accurate and real-time commands for exoskeleton robots and myoelectric prosthetic control systems, thereby enhancing the user experience and the naturalness of robot operation. Full article
Show Figures

Figure 1

21 pages, 1948 KiB  
Article
Tensorized Discrete Multi-View Spectral Clustering
by Qin Li, Geng Yang, Yu Yun, Yu Lei and Jane You
Electronics 2024, 13(3), 491; https://doi.org/10.3390/electronics13030491 - 24 Jan 2024
Viewed by 911
Abstract
Discrete spectral clustering directly obtains the discrete labels of data, but existing clustering methods assume that the real-valued indicator matrices of different views are identical, which is unreasonable in practical applications. Moreover, they do not effectively exploit the spatial structure and complementary information [...] Read more.
Discrete spectral clustering directly obtains the discrete labels of data, but existing clustering methods assume that the real-valued indicator matrices of different views are identical, which is unreasonable in practical applications. Moreover, they do not effectively exploit the spatial structure and complementary information embedded in views. To overcome this disadvantage, we propose a tensorized discrete multi-view spectral clustering model that integrates spectral embedding and spectral rotation into a unified framework. Specifically, we leverage the weighted tensor nuclear-norm regularizer on the third-order tensor, which consists of the real-valued indicator matrices of views, to exploit the complementary information embedded in the indicator matrices of different views. Furthermore, we present an adaptively weighted scheme that takes into account the relationship between views for clustering. Finally, discrete labels are obtained by spectral rotation. Experiments show the effectiveness of our proposed method. Full article
Show Figures

Figure 1

20 pages, 15144 KiB  
Article
HRYNet: A Highly Robust YOLO Network for Complex Road Traffic Object Detection
by Lindong Tang, Lijun Yun, Zaiqing Chen and Feiyan Cheng
Sensors 2024, 24(2), 642; https://doi.org/10.3390/s24020642 - 19 Jan 2024
Cited by 6 | Viewed by 2315
Abstract
Object detection is a crucial component of the perception system in autonomous driving. However, the road scene presents a highly intricate environment where the visibility and characteristics of traffic targets are susceptible to attenuation and loss due to various complex road scenarios such [...] Read more.
Object detection is a crucial component of the perception system in autonomous driving. However, the road scene presents a highly intricate environment where the visibility and characteristics of traffic targets are susceptible to attenuation and loss due to various complex road scenarios such as lighting conditions, weather conditions, time of day, background elements, and traffic density. Nevertheless, the current object detection network must exhibit more learning capabilities when detecting such targets. This also exacerbates the loss of features during the feature extraction and fusion process, significantly compromising the network’s detection performance on traffic targets. This paper presents a novel methodology by which to overcome the concerns above, namely HRYNet. Firstly, a dual fusion gradual pyramid structure (DFGPN) is introduced, which employs a two-stage gradient fusion strategy to enhance the generation of more comprehensive multi-scale high-level semantic information, strengthen the interconnection between non-adjacent feature layers, and reduce the information gap that exists between them. HRYNet introduces an anti-interference feature extraction module, the residual multi-head self-attention mechanism (RMA). RMA enhances the target information by implementing a characteristic channel weighting policy, thereby reducing background interference and improving the attention capability of the network. Finally, the detection performance of HRYNet was evaluated by utilizing three datasets: the horizontally collected dataset BDD1000K, the UAV high-altitude dataset Visdrone, and a custom dataset. Experimental results demonstrate that HRYNet achieves a higher mAP_0.5 compared with YOLOv8s on the three datasets, with increases of 10.8%, 16.7%, and 5.5%, respectively. To optimize HRYNet for mobile devices, this study presents Lightweight HRYNet (LHRYNet), which effectively reduces the number of model parameters by 2 million. The results demonstrate that LHRYNet outperforms YOLOv8s in terms of mAP_0.5, with improvements of 6.7%, 10.9%, and 2.5% observed on the three datasets, respectively. Full article
Show Figures

Figure 1

22 pages, 10627 KiB  
Article
ScanGuard-YOLO: Enhancing X-ray Prohibited Item Detection with Significant Performance Gains
by Xianning Huang and Yaping Zhang
Sensors 2024, 24(1), 102; https://doi.org/10.3390/s24010102 - 24 Dec 2023
Cited by 3 | Viewed by 1368
Abstract
To address the problem of low recall rate in the detection of prohibited items in X-ray images due to the severe object occlusion and complex background, an X-ray prohibited item detection network, ScanGuard-YOLO, based on the YOLOv5 architecture, is proposed to effectively improve [...] Read more.
To address the problem of low recall rate in the detection of prohibited items in X-ray images due to the severe object occlusion and complex background, an X-ray prohibited item detection network, ScanGuard-YOLO, based on the YOLOv5 architecture, is proposed to effectively improve the model’s recall rate and the comprehensive metric F1 score. Firstly, the RFB-s module was added to the end part of the backbone, and dilated convolution was used to increase the receptive field of the backbone network to better capture global features. In the neck section, the efficient RepGFPN module was employed to fuse multiscale information from the backbone output. This aimed to capture details and contextual information at various scales, thereby enhancing the model’s understanding and representation capability of the object. Secondly, a novel detection head was introduced to unify scale-awareness, spatial-awareness, and task-awareness altogether, which significantly improved the representation ability of the object detection heads. Finally, the bounding box regression loss function was defined as the WIOUv3 loss, effectively balancing the contribution of low-quality and high-quality samples to the loss. ScanGuard-YOLO was tested on OPIXray and HiXray datasets, showing significant improvements compared to the baseline model. The mean average precision ([email protected]) increased by 2.3% and 1.6%, the recall rate improved by 4.5% and 2%, and the F1 score increased by 2.3% and 1%, respectively. The experimental results demonstrate that ScanGuard-YOLO effectively enhances the detection capability of prohibited items in complex backgrounds and exhibits broad prospects for application. Full article
Show Figures

Figure 1

17 pages, 6588 KiB  
Article
Autoencoder-Based Visual Anomaly Localization for Manufacturing Quality Control
by Devang Mehta and Noah Klarmann
Mach. Learn. Knowl. Extr. 2024, 6(1), 1-17; https://doi.org/10.3390/make6010001 - 21 Dec 2023
Cited by 7 | Viewed by 2348
Abstract
Manufacturing industries require the efficient and voluminous production of high-quality finished goods. In the context of Industry 4.0, visual anomaly detection poses an optimistic solution for automatically controlled product quality with high precision. In general, automation based on computer vision is a promising [...] Read more.
Manufacturing industries require the efficient and voluminous production of high-quality finished goods. In the context of Industry 4.0, visual anomaly detection poses an optimistic solution for automatically controlled product quality with high precision. In general, automation based on computer vision is a promising solution to prevent bottlenecks at the product quality checkpoint. We considered recent advancements in machine learning to improve visual defect localization, but challenges persist in obtaining a balanced feature set and database of the wide variety of defects occurring in the production line. Hence, this paper proposes a defect localizing autoencoder with unsupervised class selection by clustering with k-means the features extracted from a pretrained VGG16 network. Moreover, the selected classes of defects are augmented with natural wild textures to simulate artificial defects. The study demonstrates the effectiveness of the defect localizing autoencoder with unsupervised class selection for improving defect detection in manufacturing industries. The proposed methodology shows promising results with precise and accurate localization of quality defects on melamine-faced boards for the furniture industry. Incorporating artificial defects into the training data shows significant potential for practical implementation in real-world quality control scenarios. Full article
Show Figures

Figure 1

17 pages, 3683 KiB  
Article
A Weakly Supervised Semantic Segmentation Model of Maize Seedlings and Weed Images Based on Scrawl Labels
by Lulu Zhao, Yanan Zhao, Ting Liu and Hanbing Deng
Sensors 2023, 23(24), 9846; https://doi.org/10.3390/s23249846 - 15 Dec 2023
Viewed by 962
Abstract
The task of semantic segmentation of maize and weed images using fully supervised deep learning models requires a large number of pixel-level mask labels, and the complex morphology of the maize and weeds themselves can further increase the cost of image annotation. To [...] Read more.
The task of semantic segmentation of maize and weed images using fully supervised deep learning models requires a large number of pixel-level mask labels, and the complex morphology of the maize and weeds themselves can further increase the cost of image annotation. To solve this problem, we proposed a Scrawl Label-based Weakly Supervised Semantic Segmentation Network (SL-Net). SL-Net consists of a pseudo label generation module, encoder, and decoder. The pseudo label generation module converts scrawl labels into pseudo labels that replace manual labels that are involved in network training, improving the backbone network for feature extraction based on the DeepLab-V3+ model and using a migration learning strategy to optimize the training process. The results show that the intersection over union of the pseudo labels that are generated by the pseudo label module with the ground truth is 83.32%, and the cosine similarity is 93.55%. In the semantic segmentation testing of SL-Net for image seedling of maize plants and weeds, the mean intersection over union and average precision reached 87.30% and 94.06%, which is higher than the semantic segmentation accuracy of DeepLab-V3+ and PSPNet under weakly and fully supervised learning conditions. We conduct experiments to demonstrate the effectiveness of the proposed method. Full article
Show Figures

Figure 1

13 pages, 4444 KiB  
Article
Go-Game Image Recognition Based on Improved Pix2pix
by Yanxia Zheng and Xiyuan Qian
J. Imaging 2023, 9(12), 273; https://doi.org/10.3390/jimaging9120273 - 7 Dec 2023
Viewed by 1789
Abstract
Go is a game that can be won or lost based on the number of intersections surrounded by black or white pieces. The traditional method is a manual counting method, which is time-consuming and error-prone. In addition, the generalization of the current Go-image-recognition [...] Read more.
Go is a game that can be won or lost based on the number of intersections surrounded by black or white pieces. The traditional method is a manual counting method, which is time-consuming and error-prone. In addition, the generalization of the current Go-image-recognition methods is poor, and accuracy needs to be further improved. To solve these problems, a Go-game image recognition based on an improved pix2pix was proposed. Firstly, a channel-coordinate mixed-attention (CCMA) mechanism was designed by combining channel attention and coordinate attention effectively; therefore, the model could learn the target feature information. Secondly, in order to obtain the long-distance contextual information, a deep dilated-convolution (DDC) module was proposed, which densely linked the dilated convolution with different dilated rates. The experimental results showed that compared with other existing Go-image-recognition methods, such as DenseNet, VGG-16, and Yolo v5, the proposed method could effectively improve the generalization ability and accuracy of a Go-image-recognition model, and the average accuracy rate was over 99.99%. Full article
Show Figures

Figure 1

17 pages, 11761 KiB  
Article
RepECN: Making ConvNets Better Again for Efficient Image Super-Resolution
by Qiangpu Chen, Jinghui Qin and Wushao Wen
Sensors 2023, 23(23), 9575; https://doi.org/10.3390/s23239575 - 2 Dec 2023
Viewed by 1111
Abstract
Traditional Convolutional Neural Network (ConvNet, CNN)-based image super-resolution (SR) methods have lower computation costs, making them more friendly for real-world scenarios. However, they suffer from lower performance. On the contrary, Vision Transformer (ViT)-based SR methods have achieved impressive performance recently, but these methods [...] Read more.
Traditional Convolutional Neural Network (ConvNet, CNN)-based image super-resolution (SR) methods have lower computation costs, making them more friendly for real-world scenarios. However, they suffer from lower performance. On the contrary, Vision Transformer (ViT)-based SR methods have achieved impressive performance recently, but these methods often suffer from high computation costs and model storage overhead, making them hard to meet the requirements in practical application scenarios. In practical scenarios, an SR model should reconstruct an image with high quality and fast inference. To handle this issue, we propose a novel CNN-based Efficient Residual ConvNet enhanced with structural Re-parameterization (RepECN) for a better trade-off between performance and efficiency. A stage-to-block hierarchical architecture design paradigm inspired by ViT is utilized to keep the state-of-the-art performance, while the efficiency is ensured by abandoning the time-consuming Multi-Head Self-Attention (MHSA) and by re-designing the block-level modules based on CNN. Specifically, RepECN consists of three structural modules: a shallow feature extraction module, a deep feature extraction, and an image reconstruction module. The deep feature extraction module comprises multiple ConvNet Stages (CNS), each containing 6 Re-Parameterization ConvNet Blocks (RepCNB), a head layer, and a residual connection. The RepCNB utilizes larger kernel convolutions rather than MHSA to enhance the capability of learning long-range dependence. In the image reconstruction module, an upsampling module consisting of nearest-neighbor interpolation and pixel attention is deployed to reduce parameters and maintain reconstruction performance, while bicubic interpolation on another branch allows the backbone network to focus on learning high-frequency information. The extensive experimental results on multiple public benchmarks show that our RepECN can achieve 2.5∼5× faster inference than the state-of-the-art ViT-based SR model with better or competitive super-resolving performance, indicating that our RepECN can reconstruct high-quality images with fast inference. Full article
Show Figures

Figure 1

22 pages, 2387 KiB  
Article
Android Malware Classification Based on Fuzzy Hashing Visualization
by Horacio Rodriguez-Bazan, Grigori Sidorov and Ponciano Jorge Escamilla-Ambrosio
Mach. Learn. Knowl. Extr. 2023, 5(4), 1826-1847; https://doi.org/10.3390/make5040088 - 28 Nov 2023
Cited by 2 | Viewed by 2229
Abstract
The proliferation of Android-based devices has brought about an unprecedented surge in mobile application usage, making the Android ecosystem a prime target for cybercriminals. In this paper, a new method for Android malware classification is proposed. The method implements a convolutional neural network [...] Read more.
The proliferation of Android-based devices has brought about an unprecedented surge in mobile application usage, making the Android ecosystem a prime target for cybercriminals. In this paper, a new method for Android malware classification is proposed. The method implements a convolutional neural network for malware classification using images. The research presents a novel approach to transforming the Android Application Package (APK) into a grayscale image. The image creation utilizes natural language processing techniques for text cleaning, extraction, and fuzzy hashing to represent the decompiled code from the APK in a set of hashes after preprocessing, where the image is composed of n fuzzy hashes that represent an APK. The method was tested on an Android malware dataset with 15,493 samples of five malware types. The proposed method showed an increase in accuracy compared to others in the literature, achieving up to 98.24% in the classification task. Full article
Show Figures

Figure 1

22 pages, 1471 KiB  
Article
Attention-Assisted Feature Comparison and Feature Enhancement for Class-Agnostic Counting
by Liang Dong, Yian Yu, Di Zhang and Yan Huo
Sensors 2023, 23(22), 9126; https://doi.org/10.3390/s23229126 - 11 Nov 2023
Viewed by 1246
Abstract
In this study, we address the class-agnostic counting (CAC) challenge, aiming to count instances in a query image, using just a few exemplars. Recent research has shifted towards few-shot counting (FSC), which involves counting previously unseen object classes. We present ACECount, an FSC [...] Read more.
In this study, we address the class-agnostic counting (CAC) challenge, aiming to count instances in a query image, using just a few exemplars. Recent research has shifted towards few-shot counting (FSC), which involves counting previously unseen object classes. We present ACECount, an FSC framework that combines attention mechanisms and convolutional neural networks (CNNs). ACECount identifies query image–exemplar similarities, using cross-attention mechanisms, enhances feature representations with a feature attention module, and employs a multi-scale regression head, to handle scale variations in CAC. ACECount’s experiments on the FSC-147 dataset exhibited the expected performance. ACECount achieved a reduction of 0.3 in the mean absolute error (MAE) on the validation set and a reduction of 0.26 on the test set of FSC-147, compared to previous methods. Notably, ACECount also demonstrated convincing performance in class-specific counting (CSC) tasks. Evaluation on crowd and vehicle counting datasets revealed that ACECount surpasses FSC algorithms like GMN, FamNet, SAFECount, LOCA, and SPDCN, in terms of performance. These results highlight the robust dataset generalization capabilities of our proposed algorithm. Full article
Show Figures

Figure 1

20 pages, 10786 KiB  
Article
A Binary Fast Image Registration Method Based on Fusion Information
by Huaidan Liang, Chenglong Liu, Xueguang Li and Lina Wang
Electronics 2023, 12(21), 4475; https://doi.org/10.3390/electronics12214475 - 31 Oct 2023
Cited by 1 | Viewed by 1025
Abstract
In the field of airborne aerial imaging, image stitching is often used to expand the field of view. Registration is the foundation of aerial image stitching and directly affects its success and quality. This article develops a fast binary image registration method based [...] Read more.
In the field of airborne aerial imaging, image stitching is often used to expand the field of view. Registration is the foundation of aerial image stitching and directly affects its success and quality. This article develops a fast binary image registration method based on the characteristics of airborne aerial imaging. This method first integrates aircraft parameters and calculates the ground range of the image for coarse registration. Then, based on the characteristics of FAST (Features from Accelerated Segment Test), a new sampling method, named Weighted Angular Diffusion Radial Sampling (WADRS), and matching method are designed. The method proposed in this article can achieve fast registration while ensuring registration accuracy, with a running speed that is approximately four times faster than SURF (Speed Up Robust Features). Additionally, there is no need to manually select any control points before registration. The results indicate that the proposed method can effectively complete remote sensing image registration from different perspectives. Full article
Show Figures

Figure 1

14 pages, 8937 KiB  
Article
A Fabric Defect Segmentation Model Based on Improved Swin-Unet with Gabor Filter
by Haitao Xu, Chengming Liu, Shuya Duan, Liangpin Ren, Guozhen Cheng and Bing Hao
Appl. Sci. 2023, 13(20), 11386; https://doi.org/10.3390/app132011386 - 17 Oct 2023
Viewed by 1352
Abstract
Fabric inspection is critical in fabric manufacturing. Automatic detection of fabric defects in the textile industry has always been an important research field. Previously, manual visual inspection was commonly used; however, there were drawbacks such as high labor costs, slow detection speed, and [...] Read more.
Fabric inspection is critical in fabric manufacturing. Automatic detection of fabric defects in the textile industry has always been an important research field. Previously, manual visual inspection was commonly used; however, there were drawbacks such as high labor costs, slow detection speed, and high error rates. Recently, many defect detection methods based on deep learning have been proposed. However, problems need to be solved in the existing methods, such as detection accuracy and interference of complex background textures. In this paper, we propose an efficient segmentation algorithm that combines traditional operators with deep learning networks to alleviate the existing problems. Specifically, we introduce a Gabor filter into the model, which provides the unique advantage of extracting low-level texture features to solve the problem of texture interference and enable the algorithm to converge quickly in the early stages of training. Furthermore, we design a U-shaped architecture that is not completely symmetrical, making model training easier. Meanwhile, multi-stage result fusion is proposed for precise location of defects. The design of this framework significantly improves the detection accuracy and effectively breaks through the limitations of transformer-based models. Experimental results show that on a dataset with one class, a small amount of data, and complex sample background texture, our method achieved 90.03% and 33.70% in ACC and IoU, respectively, which is almost 10% higher than other previous state of the art models. Experimental results based on three different fabric datasets consistently show that the proposed model has excellent performance and great application potential in the industrial field. Full article
Show Figures

Figure 1

21 pages, 6594 KiB  
Article
Enhanced YOLOv5: An Efficient Road Object Detection Method
by Hao Chen, Zhan Chen and Hang Yu
Sensors 2023, 23(20), 8355; https://doi.org/10.3390/s23208355 - 10 Oct 2023
Cited by 8 | Viewed by 3975
Abstract
Accurate identification of road objects is crucial for achieving intelligent traffic systems. However, developing efficient and accurate road object detection methods in complex traffic scenarios has always been a challenging task. The objective of this study was to improve the target detection algorithm [...] Read more.
Accurate identification of road objects is crucial for achieving intelligent traffic systems. However, developing efficient and accurate road object detection methods in complex traffic scenarios has always been a challenging task. The objective of this study was to improve the target detection algorithm for road object detection by enhancing the algorithm’s capability to fuse features of different scales and levels, thereby improving the accurate identification of objects in complex road scenes. We propose an improved method called the Enhanced YOLOv5 algorithm for road object detection. By introducing the Bidirectional Feature Pyramid Network (BiFPN) into the YOLOv5 algorithm, we address the challenges of multi-scale and multi-level feature fusion and enhance the detection capability for objects of different sizes. Additionally, we integrate the Convolutional Block Attention Module (CBAM) into the existing YOLOv5 model to enhance its feature representation capability. Furthermore, we employ a new non-maximum suppression technique called Distance Intersection Over Union (DIOU) to effectively address issues such as misjudgment and duplicate detection when significant overlap occurs between bounding boxes. We use mean Average Precision (mAP) and Precision (P) as evaluation metrics. Finally, experimental results on the BDD100K dataset demonstrate that the improved YOLOv5 algorithm achieves a 1.6% increase in object detection mAP, while the P value increases by 5.3%, effectively improving the accuracy and robustness of road object recognition. Full article
Show Figures

Figure 1

17 pages, 4295 KiB  
Article
Few-Shot Air Object Detection Network
by Wei Cai, Xin Wang, Xinhao Jiang, Zhiyong Yang, Xingyu Di and Weijie Gao
Electronics 2023, 12(19), 4133; https://doi.org/10.3390/electronics12194133 - 4 Oct 2023
Viewed by 1004
Abstract
Focusing on the problem of low detection precision caused by the few-shot and multi-scale characteristics of air objects, we propose a few-shot air object detection network (FADNet). We first use a transformer as the backbone network of the model and then build a [...] Read more.
Focusing on the problem of low detection precision caused by the few-shot and multi-scale characteristics of air objects, we propose a few-shot air object detection network (FADNet). We first use a transformer as the backbone network of the model and then build a multi-scale attention mechanism (MAM) to deeply fuse the W- and H-dimension features extracted from the channel dimension and the local and global features extracted from the spatial dimension with the object features to improve the network’s performance when detecting air objects. Second, the neck network is innovated based on the path aggregation network (PANet), resulting in an improved path aggregation network (IPANet). Our proposed network reduces the information lost during feature transfer by introducing a jump connection, utilizes sparse connection convolution, strengthens feature extraction abilities at all scales, and improves the discriminative properties of air object features at all scales. Finally, we propose a multi-scale regional proposal network (MRPN) that can establish multiple RPNs based on the scale types of the output features, utilizing adaptive convolutions to effectively extract object features at each scale and enhancing the ability to process multi-scale information. The experimental results showed that our proposed method exhibits good performance and generalization, especially in the 1-, 2-, 3-, 5-, and 10-shot experiments, with average accuracies of 33.2%, 36.8%, 43.3%, 47.2%, and 60.4%, respectively. The FADNet solves the problems posed by the few-shot characteristics and multi-scale characteristics of air objects, as well as improving the detection capabilities of the air object detection model. Full article
Show Figures

Figure 1

23 pages, 17933 KiB  
Article
Dual Histogram Equalization Algorithm Based on Adaptive Image Correction
by Bowen Ye, Sun Jin, Bing Li, Shuaiyu Yan and Deng Zhang
Appl. Sci. 2023, 13(19), 10649; https://doi.org/10.3390/app131910649 - 25 Sep 2023
Cited by 3 | Viewed by 1426
Abstract
For the visual measurement of moving arm holes in complex working conditions, a histogram equalization algorithm can be used to improve image contrast. To lessen the problems of image brightness shift, image over-enhancement, and gray-level merging that occur with the traditional histogram equalization [...] Read more.
For the visual measurement of moving arm holes in complex working conditions, a histogram equalization algorithm can be used to improve image contrast. To lessen the problems of image brightness shift, image over-enhancement, and gray-level merging that occur with the traditional histogram equalization algorithm, a dual histogram equalization algorithm based on adaptive image correction (AICHE) is proposed. To prevent luminance shifts from occurring during image equalization, the AICHE algorithm protects the average luminance of the input image by improving upon the Otsu algorithm, enabling it to split the histogram. Then, the AICHE algorithm uses the local grayscale correction algorithm to correct the grayscale to prevent the image over-enhancement and gray-level merging problems that arise with the traditional algorithm. It is experimentally verified that the AICHE algorithm can significantly improve the histogram segmentation effect and enhance the contrast and detail information while protecting the average brightness of the input image, and thus the image quality is significantly increased. Full article
Show Figures

Figure 1

15 pages, 4788 KiB  
Article
Saliency-Driven Hand Gesture Recognition Incorporating Histogram of Oriented Gradients (HOG) and Deep Learning
by Farzaneh Jafari and Anup Basu
Sensors 2023, 23(18), 7790; https://doi.org/10.3390/s23187790 - 11 Sep 2023
Cited by 2 | Viewed by 1249
Abstract
Hand gesture recognition is a vital means of communication to convey information between humans and machines. We propose a novel model for hand gesture recognition based on computer vision methods and compare results based on images with complex scenes. While extracting skin color [...] Read more.
Hand gesture recognition is a vital means of communication to convey information between humans and machines. We propose a novel model for hand gesture recognition based on computer vision methods and compare results based on images with complex scenes. While extracting skin color information is an efficient method to determine hand regions, complicated image backgrounds adversely affect recognizing the exact area of the hand shape. Some valuable features like saliency maps, histogram of oriented gradients (HOG), Canny edge detection, and skin color help us maximize the accuracy of hand shape recognition. Considering these features, we proposed an efficient hand posture detection model that improves the test accuracy results to over 99% on the NUS Hand Posture Dataset II and more than 97% on the hand gesture dataset with different challenging backgrounds. In addition, we added noise to around 60% of our datasets. Replicating our experiment, we achieved more than 98% and nearly 97% accuracy on NUS and hand gesture datasets, respectively. Experiments illustrate that the saliency method with HOG has stable performance for a wide range of images with complex backgrounds having varied hand colors and sizes. Full article
Show Figures

Figure 1

19 pages, 21026 KiB  
Article
Detection of Wheat Yellow Rust Disease Severity Based on Improved GhostNetV2
by Zhihui Li, Xin Fang, Tong Zhen and Yuhua Zhu
Appl. Sci. 2023, 13(17), 9987; https://doi.org/10.3390/app13179987 - 4 Sep 2023
Cited by 7 | Viewed by 1754
Abstract
Wheat production safety is facing serious challenges because wheat yellow rust is a worldwide disease. Wheat yellow rust may have no obvious external manifestations in the early stage, and it is difficult to detect whether it is infected, but in the middle and [...] Read more.
Wheat production safety is facing serious challenges because wheat yellow rust is a worldwide disease. Wheat yellow rust may have no obvious external manifestations in the early stage, and it is difficult to detect whether it is infected, but in the middle and late stages of onset, the symptoms of the disease are obvious, though the severity is difficult to distinguish. A traditional deep learning network model has a large number of parameters, a large amount of calculation, a long time for model training, and high resource consumption, making it difficult to transplant to mobile and edge terminals. To address the above issues, this study proposes an optimized GhostNetV2 approach. First, to increase communication between groups, a channel rearrangement operation is performed on the output of the Ghost module. Then, the first five G-bneck layers of the source model GhostNetV2 are replaced with Fused-MBConv to accelerate model training. Finally, to further improve the model’s identification of diseases, the source attention mechanism SE is replaced by ECA. After experimental comparison, the improved algorithm shortens the training time by 37.49%, and the accuracy rate reaches 95.44%, which is 2.24% higher than the GhostNetV2 algorithm. The detection accuracy and speed have major improvements compared with other lightweight model algorithms. Full article
Show Figures

Figure 1

19 pages, 6677 KiB  
Article
A Long Skip Connection for Enhanced Color Selectivity in CNN Architectures
by Oscar Sanchez-Cesteros, Mariano Rincon, Margarita Bachiller and Sonia Valladares-Rodriguez
Sensors 2023, 23(17), 7582; https://doi.org/10.3390/s23177582 - 31 Aug 2023
Viewed by 1308
Abstract
Some recent studies show that filters in convolutional neural networks (CNNs) have low color selectivity in datasets of natural scenes such as Imagenet. CNNs, bio-inspired by the visual cortex, are characterized by their hierarchical learning structure which appears to gradually transform the representation [...] Read more.
Some recent studies show that filters in convolutional neural networks (CNNs) have low color selectivity in datasets of natural scenes such as Imagenet. CNNs, bio-inspired by the visual cortex, are characterized by their hierarchical learning structure which appears to gradually transform the representation space. Inspired by the direct connection between the LGN and V4, which allows V4 to handle low-level information closer to the trichromatic input in addition to processed information that comes from V2/V3, we propose the addition of a long skip connection (LSC) between the first and last blocks of the feature extraction stage to allow deeper parts of the network to receive information from shallower layers. This type of connection improves classification accuracy by combining simple-visual and complex-abstract features to create more color-selective ones. We have applied this strategy to classic CNN architectures and quantitatively and qualitatively analyzed the improvement in accuracy while focusing on color selectivity. The results show that, in general, skip connections improve accuracy, but LSC improves it even more and enhances the color selectivity of the original CNN architectures. As a side result, we propose a new color representation procedure for organizing and filtering feature maps, making their visualization more manageable for qualitative color selectivity analysis. Full article
Show Figures

Figure 1

19 pages, 2992 KiB  
Article
MCMNET: Multi-Scale Context Modeling Network for Temporal Action Detection
by Haiping Zhang, Fuxing Zhou, Conghao Ma, Dongjing Wang and Wanjun Zhang
Sensors 2023, 23(17), 7563; https://doi.org/10.3390/s23177563 - 31 Aug 2023
Viewed by 1153
Abstract
Temporal action detection is a very important and challenging task in the field of video understanding, especially for datasets with significant differences in action duration. The temporal relationships between the action instances contained in these datasets are very complex. For such videos, it [...] Read more.
Temporal action detection is a very important and challenging task in the field of video understanding, especially for datasets with significant differences in action duration. The temporal relationships between the action instances contained in these datasets are very complex. For such videos, it is necessary to capture information with a richer temporal distribution as much as possible. In this paper, we propose a dual-stream model that can model contextual information at multiple temporal scales. First, the input video is divided into two resolution streams, followed by a Multi-Resolution Context Aggregation module to capture multi-scale temporal information. Additionally, an Information Enhancement module is added after the high-resolution input stream to model both long-range and short-range contexts. Finally, the outputs of the two modules are merged to obtain features with rich temporal information for action localization and classification. We conducted experiments on three datasets to evaluate the proposed approach. On ActivityNet-v1.3, an average mAP (mean Average Precision) of 32.83% was obtained. On Charades, the best performance was obtained, with an average mAP of 27.3%. On TSU (Toyota Smarthome Untrimmed), an average mAP of 33.1% was achieved. Full article
Show Figures

Figure 1

16 pages, 9000 KiB  
Article
Infrared Dim and Small Target Sequence Dataset Generation Method Based on Generative Adversarial Networks
by Leihong Zhang, Weihong Lin, Zimin Shen, Dawei Zhang, Banglian Xu, Kaimin Wang and Jian Chen
Electronics 2023, 12(17), 3625; https://doi.org/10.3390/electronics12173625 - 28 Aug 2023
Cited by 2 | Viewed by 1374
Abstract
With the development of infrared technology, infrared dim and small target detection plays a vital role in precision guidance applications. To address the problems of insufficient dataset coverage and huge actual shooting costs in infrared dim and small target detection methods, this paper [...] Read more.
With the development of infrared technology, infrared dim and small target detection plays a vital role in precision guidance applications. To address the problems of insufficient dataset coverage and huge actual shooting costs in infrared dim and small target detection methods, this paper proposes a method for generating infrared dim and small target sequence datasets based on generative adversarial networks (GANs). Specifically, first, the improved deep convolutional generative adversarial network (DCGAN) model is used to generate clear images of the infrared sky background. Then, target–background sequence images are constructed using multi-scale feature extraction and improved conditional generative adversarial networks. This method fully considers the infrared characteristics of the target and the background, which can achieve effective expansion of the image data and provide a test set for the infrared small target detection and recognition algorithm. In addition, the classifier’s performance can be improved by expanding the training set, which enhances the accuracy and effect of infrared dim and small target detection based on deep learning. After experimental evaluation, the dataset generated by this method is similar to the real infrared dataset, and the model detection accuracy can be improved after training with the latest deep learning model. Full article
Show Figures

Figure 1

23 pages, 9230 KiB  
Article
Unification of Road Scene Segmentation Strategies Using Multistream Data and Latent Space Attention
by August J. Naudé and Herman C. Myburgh
Sensors 2023, 23(17), 7355; https://doi.org/10.3390/s23177355 - 23 Aug 2023
Viewed by 1091
Abstract
Road scene understanding, as a field of research, has attracted increasing attention in recent years. The development of road scene understanding capabilities that are applicable to real-world road scenarios has seen numerous complications. This has largely been due to the cost and complexity [...] Read more.
Road scene understanding, as a field of research, has attracted increasing attention in recent years. The development of road scene understanding capabilities that are applicable to real-world road scenarios has seen numerous complications. This has largely been due to the cost and complexity of achieving human-level scene understanding, at which successful segmentation of road scene elements can be achieved with a mean intersection over union score close to 1.0. There is a need for more of a unified approach to road scene segmentation for use in self-driving systems. Previous works have demonstrated how deep learning methods can be combined to improve the segmentation and perception performance of road scene understanding systems. This paper proposes a novel segmentation system that uses fully connected networks, attention mechanisms, and multiple-input data stream fusion to improve segmentation performance. Results show comparable performance compared to previous works, with a mean intersection over union of 87.4% on the Cityscapes dataset. Full article
Show Figures

Figure 1

18 pages, 5485 KiB  
Article
Vision Transformer Customized for Environment Detection and Collision Prediction to Assist the Visually Impaired
by Nasrin Bayat, Jong-Hwan Kim, Renoa Choudhury, Ibrahim F. Kadhim, Zubaidah Al-Mashhadani, Mark Aldritz Dela Virgen, Reuben Latorre, Ricardo De La Paz and Joon-Hyuk Park
J. Imaging 2023, 9(8), 161; https://doi.org/10.3390/jimaging9080161 - 15 Aug 2023
Cited by 4 | Viewed by 1926
Abstract
This paper presents a system that utilizes vision transformers and multimodal feedback modules to facilitate navigation and collision avoidance for the visually impaired. By implementing vision transformers, the system achieves accurate object detection, enabling the real-time identification of objects in front of the [...] Read more.
This paper presents a system that utilizes vision transformers and multimodal feedback modules to facilitate navigation and collision avoidance for the visually impaired. By implementing vision transformers, the system achieves accurate object detection, enabling the real-time identification of objects in front of the user. Semantic segmentation and the algorithms developed in this work provide a means to generate a trajectory vector of all identified objects from the vision transformer and to detect objects that are likely to intersect with the user’s walking path. Audio and vibrotactile feedback modules are integrated to convey collision warning through multimodal feedback. The dataset used to create the model was captured from both indoor and outdoor settings under different weather conditions at different times across multiple days, resulting in 27,867 photos consisting of 24 different classes. Classification results showed good performance (95% accuracy), supporting the efficacy and reliability of the proposed model. The design and control methods of the multimodal feedback modules for collision warning are also presented, while the experimental validation concerning their usability and efficiency stands as an upcoming endeavor. The demonstrated performance of the vision transformer and the presented algorithms in conjunction with the multimodal feedback modules show promising prospects of its feasibility and applicability for the navigation assistance of individuals with vision impairment. Full article
Show Figures

Figure 1

18 pages, 4489 KiB  
Article
Center Deviation Measurement of Color Contact Lenses Based on a Deep Learning Model and Hough Circle Transform
by Gi-nam Kim, Sung-hoon Kim, In Joo, Gui-bae Kim and Kwan-hee Yoo
Sensors 2023, 23(14), 6533; https://doi.org/10.3390/s23146533 - 19 Jul 2023
Cited by 3 | Viewed by 1550
Abstract
Ensuring the quality of color contact lenses is vital, particularly in detecting defects during their production since they are directly worn on the eyes. One significant defect is the “center deviation (CD) defect”, where the colored area (CA) deviates from the center point. [...] Read more.
Ensuring the quality of color contact lenses is vital, particularly in detecting defects during their production since they are directly worn on the eyes. One significant defect is the “center deviation (CD) defect”, where the colored area (CA) deviates from the center point. Measuring the extent of deviation of the CA from the center point is necessary to detect these CD defects. In this study, we propose a method that utilizes image processing and analysis techniques for detecting such defects. Our approach involves employing semantic segmentation to simplify the image and reduce noise interference and utilizing the Hough circle transform algorithm to measure the deviation of the center point of the CA in color contact lenses. Experimental results demonstrated that our proposed method achieved a 71.2% reduction in error compared with existing research methods. Full article
Show Figures

Figure 1

28 pages, 4274 KiB  
Article
Improving Small-Scale Human Action Recognition Performance Using a 3D Heatmap Volume
by Lin Yuan, Zhen He, Qiang Wang, Leiyang Xu and Xiang Ma
Sensors 2023, 23(14), 6364; https://doi.org/10.3390/s23146364 - 13 Jul 2023
Cited by 3 | Viewed by 2044
Abstract
In recent years, skeleton-based human action recognition has garnered significant research attention, with proposed recognition or segmentation methods typically validated on large-scale coarse-grained action datasets. However, there remains a lack of research on the recognition of small-scale fine-grained human actions using deep learning [...] Read more.
In recent years, skeleton-based human action recognition has garnered significant research attention, with proposed recognition or segmentation methods typically validated on large-scale coarse-grained action datasets. However, there remains a lack of research on the recognition of small-scale fine-grained human actions using deep learning methods, which have greater practical significance. To address this gap, we propose a novel approach based on heatmap-based pseudo videos and a unified, general model applicable to all modality datasets. Leveraging anthropometric kinematics as prior information, we extract common human motion features among datasets through an ad hoc pre-trained model. To overcome joint mismatch issues, we partition the human skeleton into five parts, a simple yet effective technique for information sharing. Our approach is evaluated on two datasets, including the public Nursing Activities and our self-built Tai Chi Action dataset. Results from linear evaluation protocol and fine-tuned evaluation demonstrate that our pre-trained model effectively captures common motion features among human actions and achieves steady and precise accuracy across all training settings, while mitigating network overfitting. Notably, our model outperforms state-of-the-art models in recognition accuracy when fusing joint and limb modality features along the channel dimension. Full article
Show Figures

Figure 1

16 pages, 3234 KiB  
Article
Unsupervised Vehicle Re-Identification Based on Cross-Style Semi-Supervised Pre-Training and Feature Cross-Division
by Guowei Zhan, Qi Wang, Weidong Min, Qing Han, Haoyu Zhao and Zitai Wei
Electronics 2023, 12(13), 2931; https://doi.org/10.3390/electronics12132931 - 3 Jul 2023
Viewed by 1055
Abstract
Vehicle Re-Identification (Re-ID) based on Unsupervised Domain Adaptation (UDA) has shown promising performance. However, two main issues still exist: (1) existing methods that use Generative Adversarial Networks (GANs) for domain gap alleviation combine supervised learning with hard labels of the source domain, resulting [...] Read more.
Vehicle Re-Identification (Re-ID) based on Unsupervised Domain Adaptation (UDA) has shown promising performance. However, two main issues still exist: (1) existing methods that use Generative Adversarial Networks (GANs) for domain gap alleviation combine supervised learning with hard labels of the source domain, resulting in a mismatch between style transfer data and hard labels; (2) pseudo label assignment in the fine-tuning stage is solely determined by similarity measures of global features using clustering algorithms, leading to inevitable label noise in generated pseudo labels. To tackle these issues, this paper proposes an unsupervised vehicle re-identification framework based on cross-style semi-supervised pre-training and feature cross-division. The framework consists of two parts: cross-style semi-supervised pre-training (CSP) and feature cross-division (FCD) for model fine-tuning. The CSP module generates style transfer data containing source domain content and target domain style using a style transfer network, and then pre-trains the model in a semi-supervised manner using both source domain and style transfer data. A pseudo-label reassignment strategy is designed to generate soft labels assigned to the style transfer data. The FCD module obtains feature partitions through a novel interactive division to reduce the dependence of pseudo-labels on global features, and the final similarity measurement combines the results of partition features and global features. Experimental results on the VehicleID and VeRi-776 datasets show that the proposed method outperforms existing unsupervised vehicle re-identification methods. Compared with the last best method on each dataset, the method proposed in this paper improves the mAP by 0.63% and the Rank-1 by 0.73% on the three sub-datasets of VehicleID on average, and it improves mAP by 0.9% and Rank-1 by 1% on VeRi-776 dataset. Full article
Show Figures

Figure 1

18 pages, 595 KiB  
Article
Transfer Learning for Sentiment Classification Using Bidirectional Encoder Representations from Transformers (BERT) Model
by Ali Areshey and Hassan Mathkour
Sensors 2023, 23(11), 5232; https://doi.org/10.3390/s23115232 - 31 May 2023
Cited by 8 | Viewed by 3441
Abstract
Sentiment is currently one of the most emerging areas of research due to the large amount of web content coming from social networking websites. Sentiment analysis is a crucial process for recommending systems for most people. Generally, the purpose of sentiment analysis is [...] Read more.
Sentiment is currently one of the most emerging areas of research due to the large amount of web content coming from social networking websites. Sentiment analysis is a crucial process for recommending systems for most people. Generally, the purpose of sentiment analysis is to determine an author’s attitude toward a subject or the overall tone of a document. There is a huge collection of studies that make an effort to predict how useful online reviews will be and have produced conflicting results on the efficacy of different methodologies. Furthermore, many of the current solutions employ manual feature generation and conventional shallow learning methods, which restrict generalization. As a result, the goal of this research is to develop a general approach using transfer learning by applying the “BERT (Bidirectional Encoder Representations from Transformers)”-based model. The efficiency of BERT classification is then evaluated by comparing it with similar machine learning techniques. In the experimental evaluation, the proposed model demonstrated superior performance in terms of outstanding prediction and high accuracy compared to earlier research. Comparative tests conducted on positive and negative Yelp reviews reveal that fine-tuned BERT classification performs better than other approaches. In addition, it is observed that BERT classifiers using batch size and sequence length significantly affect classification performance. Full article
Show Figures

Figure 1

22 pages, 5873 KiB  
Article
Lightweight Multiscale CNN Model for Wheat Disease Detection
by Xin Fang, Tong Zhen and Zhihui Li
Appl. Sci. 2023, 13(9), 5801; https://doi.org/10.3390/app13095801 - 8 May 2023
Cited by 12 | Viewed by 3141
Abstract
Wheat disease detection is crucial for disease diagnosis, pesticide application optimization, disease control, and wheat yield and quality improvement. However, the detection of wheat diseases is difficult due to their various types. Detecting wheat diseases in complex fields is also challenging. Traditional models [...] Read more.
Wheat disease detection is crucial for disease diagnosis, pesticide application optimization, disease control, and wheat yield and quality improvement. However, the detection of wheat diseases is difficult due to their various types. Detecting wheat diseases in complex fields is also challenging. Traditional models are difficult to apply to mobile devices because they have large parameters, and high computation and resource requirements. To address these issues, this paper combines the residual module and the inception module to construct a lightweight multiscale CNN model, which introduces the CBAM and ECA modules into the residual block, enhances the model’s attention to diseases, and reduces the influence of complex backgrounds on disease recognition. The proposed method has an accuracy rate of 98.7% on the test dataset, which is higher than classic convolutional neural networks such as AlexNet, VGG16, and InceptionresnetV2 and lightweight models such as MobileNetV3 and EfficientNetb0. The proposed model has superior performance and can be applied to mobile terminals to quickly identify wheat diseases. Full article
Show Figures

Figure 1

19 pages, 8938 KiB  
Article
Development of an Accurate and Automated Quality Inspection System for Solder Joints on Aviation Plugs Using Fine-Tuned YOLOv5 Models
by Junwei Sha, Junpu Wang, Huanran Hu, Yongqiang Ye and Guili Xu
Appl. Sci. 2023, 13(9), 5290; https://doi.org/10.3390/app13095290 - 23 Apr 2023
Cited by 9 | Viewed by 2322
Abstract
The quality inspection of solder joints on aviation plugs is extremely important in modern manufacturing industries. However, this task is still mostly performed by skilled workers after welding operations, posing the problems of subjective judgment and low efficiency. To address these issues, an [...] Read more.
The quality inspection of solder joints on aviation plugs is extremely important in modern manufacturing industries. However, this task is still mostly performed by skilled workers after welding operations, posing the problems of subjective judgment and low efficiency. To address these issues, an accurate and automated detection system using fine-tuned YOLOv5 models is developed in this paper. Firstly, we design an intelligent image acquisition system to obtain the high-resolution image of each solder joint automatically. Then, a two-phase approach is proposed for fast and accurate weld quality detection. In the first phase, a fine-tuned YOLOv5 model is applied to extract the region of interest (ROI), i.e., the row of solder joints to be inspected, within the whole image. With the sliding platform, the ROI is automatically moved to the center of the image to enhance its imaging clarity. Subsequently, another fine-tuned YOLOv5 model takes this adjusted ROI as input and realizes quality assessment. Finally, a concise and easy-to-use GUI has been designed and deployed in real production lines. Experimental results in the actual production line show that the proposed method can achieve a detection accuracy of more than 97.5% with a detection speed of about 0.1 s, which meets the needs of actual production Full article
Show Figures

Figure 1

15 pages, 7610 KiB  
Article
FM-STDNet: High-Speed Detector for Fast-Moving Small Targets Based on Deep First-Order Network Architecture
by Xinyu Hu, Defeng Kong, Xiyang Liu, Junwei Zhang and Daode Zhang
Electronics 2023, 12(8), 1829; https://doi.org/10.3390/electronics12081829 - 12 Apr 2023
Cited by 3 | Viewed by 1435
Abstract
Identifying objects of interest from digital vision signals is a core task of intelligent systems. However, fast and accurate identification of small moving targets in real-time has become a bottleneck in the field of target detection. In this paper, the problem of real-time [...] Read more.
Identifying objects of interest from digital vision signals is a core task of intelligent systems. However, fast and accurate identification of small moving targets in real-time has become a bottleneck in the field of target detection. In this paper, the problem of real-time detection of the fast-moving printed circuit board (PCB) tiny targets is investigated. This task is very challenging because PCB defects are usually small compared to the whole PCB board, and due to the pursuit of production efficiency, the actual production PCB moving speed is usually very fast, which puts higher requirements on the real-time of intelligent systems. To this end, a new model of FM-STDNet (Fast Moving Small Target Detection Network) is proposed based on the well-known deep learning detector YOLO (You Only Look Once) series model. First, based on the SPPNet (Spatial Pyramid Pooling Networks) network, a new SPPFCSP (Spatial Pyramid Pooling Fast Cross Stage Partial Network) spatial pyramid pooling module is designed to adapt to the extraction of different scale size features of different size input images, which helps retain the high semantic information of smaller features; then, the anchor-free mode is introduced to directly classify the regression prediction information and do the structural reparameterization construction to design a new high-speed prediction head RepHead to further improve the operation speed of the detector. The experimental results show that the proposed detector achieves 99.87% detection accuracy at the fastest speed compared to state-of-the-art depth detectors such as YOLOv3, Faster R-CNN, and TDD-Net in the fast-moving PCB surface defect detection task. The new model of FM-STDNet provides an effective reference for the fast-moving small target detection task. Full article
Show Figures

Figure 1

20 pages, 19138 KiB  
Article
Improved Mask R-CNN Multi-Target Detection and Segmentation for Autonomous Driving in Complex Scenes
by Shuqi Fang, Bin Zhang and Jingyu Hu
Sensors 2023, 23(8), 3853; https://doi.org/10.3390/s23083853 - 10 Apr 2023
Cited by 19 | Viewed by 5455
Abstract
Vision-based target detection and segmentation has been an important research content for environment perception in autonomous driving, but the mainstream target detection and segmentation algorithms have the problems of low detection accuracy and poor mask segmentation quality for multi-target detection and segmentation in [...] Read more.
Vision-based target detection and segmentation has been an important research content for environment perception in autonomous driving, but the mainstream target detection and segmentation algorithms have the problems of low detection accuracy and poor mask segmentation quality for multi-target detection and segmentation in complex traffic scenes. To address this problem, this paper improved the Mask R-CNN by replacing the backbone network ResNet with the ResNeXt network with group convolution to further improve the feature extraction capability of the model. Furthermore, a bottom-up path enhancement strategy was added to the Feature Pyramid Network (FPN) to achieve feature fusion, while an efficient channel attention module (ECA) was added to the backbone feature extraction network to optimize the high-level low resolution semantic information graph. Finally, the bounding box regression loss function smooth L1 loss was replaced by CIoU loss to speed up the model convergence and minimize the error. The experimental results showed that the improved Mask R-CNN algorithm achieved 62.62% mAP for target detection and 57.58% mAP for segmentation accuracy on the publicly available CityScapes autonomous driving dataset, which were 4.73% and 3.96%% better than the original Mask R-CNN algorithm, respectively. The migration experiments showed that it has good detection and segmentation effects in each traffic scenario of the publicly available BDD autonomous driving dataset. Full article
Show Figures

Figure 1

15 pages, 5355 KiB  
Article
Insights into Batch Selection for Event-Camera Motion Estimation
by Juan L. Valerdi, Chiara Bartolozzi and Arren Glover
Sensors 2023, 23(7), 3699; https://doi.org/10.3390/s23073699 - 3 Apr 2023
Cited by 2 | Viewed by 1792
Abstract
Event cameras measure scene changes with high temporal resolutions, making them well-suited for visual motion estimation. The activation of pixels results in an asynchronous stream of digital data (events), which rolls continuously over time without the discrete temporal boundaries typical of frame-based cameras [...] Read more.
Event cameras measure scene changes with high temporal resolutions, making them well-suited for visual motion estimation. The activation of pixels results in an asynchronous stream of digital data (events), which rolls continuously over time without the discrete temporal boundaries typical of frame-based cameras (where a data packet or frame is emitted at a fixed temporal rate). As such, it is not trivial to define a priori how to group/accumulate events in a way that is sufficient for computation. The suitable number of events can greatly vary for different environments, motion patterns, and tasks. In this paper, we use neural networks for rotational motion estimation as a scenario to investigate the appropriate selection of event batches to populate input tensors. Our results show that batch selection has a large impact on the results: training should be performed on a wide variety of different batches, regardless of the batch selection method; a simple fixed-time window is a good choice for inference with respect to fixed-count batches, and it also demonstrates comparable performance to more complex methods. Our initial hypothesis that a minimal amount of events is required to estimate motion (as in contrast maximization) is not valid when estimating motion with a neural network. Full article
Show Figures

Figure 1

10 pages, 1549 KiB  
Article
Self-Supervised Facial Motion Representation Learning via Contrastive Subclips
by Zheng Sun, Shad A. Torrie, Andrew W. Sumsion and Dah-Jye Lee
Electronics 2023, 12(6), 1369; https://doi.org/10.3390/electronics12061369 - 13 Mar 2023
Viewed by 1406
Abstract
Facial motion representation learning has become an exciting research topic, since biometric technologies are becoming more common in our daily lives. One of its applications is identity verification. After recording a dynamic facial motion video for enrollment, the user needs to show a [...] Read more.
Facial motion representation learning has become an exciting research topic, since biometric technologies are becoming more common in our daily lives. One of its applications is identity verification. After recording a dynamic facial motion video for enrollment, the user needs to show a matched facial appearance and make a facial motion the same as the enrollment for authentication. Some recent research papers have discussed the benefits of this new biometric technology and reported promising results for both static and dynamic facial motion verification tasks. Our work extends the existing approaches and introduces compound facial actions, which contain more than one dominant facial action in one utterance. We propose a new self-supervised pretraining method called contrastive subclips that improves the model performance with these more complex and secure facial motions. The experimental results show that the contrastive subclips method improves upon the baseline approaches, and the model performance for test data can reach 89.7% average precision. Full article
Show Figures

Figure 1

30 pages, 752 KiB  
Review
The Challenges of Recognizing Offline Handwritten Chinese: A Technical Review
by Lu Shen, Bidong Chen, Jianjing Wei, Hui Xu, Su-Kit Tang and Silvia Mirri
Appl. Sci. 2023, 13(6), 3500; https://doi.org/10.3390/app13063500 - 9 Mar 2023
Cited by 7 | Viewed by 3683
Abstract
Offline handwritten Chinese recognition is an important research area of pattern recognition, including offline handwritten Chinese character recognition (offline HCCR) and offline handwritten Chinese text recognition (offline HCTR), which are closely related to daily life. With new deep learning techniques and the combination [...] Read more.
Offline handwritten Chinese recognition is an important research area of pattern recognition, including offline handwritten Chinese character recognition (offline HCCR) and offline handwritten Chinese text recognition (offline HCTR), which are closely related to daily life. With new deep learning techniques and the combination with other domain knowledge, offline handwritten Chinese recognition has gained breakthroughs in methods and performance in recent years. However, there have yet to be articles that provide a technical review of this field since 2016. In light of this, this paper reviews the research progress and challenges of offline handwritten Chinese recognition based on traditional techniques, deep learning methods, methods combining deep learning with traditional techniques, and knowledge from other areas from 2016 to 2022. Firstly, it introduces the research background and status of handwritten Chinese recognition, standard datasets, and evaluation metrics. Secondly, a comprehensive summary and analysis of offline HCCR and offline HCTR approaches during the last seven years is provided, along with an explanation of their concepts, specifics, and performances. Finally, the main research problems in this field over the past few years are presented. The challenges still exist in offline handwritten Chinese recognition are discussed, aiming to inspire future research work. Full article
Show Figures

Figure 1

18 pages, 3434 KiB  
Article
Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images
by Agus Nursikuwagus, Rinaldi Munir and Masayu Leylia Khodra
J. Imaging 2022, 8(11), 294; https://doi.org/10.3390/jimaging8110294 - 22 Oct 2022
Cited by 2 | Viewed by 2945
Abstract
Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, [...] Read more.
Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, generating captions from the geological images of rocks is more focused on the background of the images. This study proposed image captioning using a convolutional neural network, long short-term memory, and word2vec to generate words from the image. The proposed model was constructed by a convolutional neural network (CNN), long short-term memory (LSTM), and word2vec and gave a dense output of 256 units. To make it properly grammatical, a sequence of predicted words was reconstructed into a sentence by the beam search algorithm with K = 3. An evaluation of the pre-trained baseline model VGG16 and our proposed CNN-A, CNN-B, CNN-C, and CNN-D models used BLEU score methods for the N-gram. The BLEU scores achieved for BLEU-1 using these models were 0.5515, 0.6463, 0.7012, 0.7620, and 0.5620, respectively. BLEU-2 showed scores of 0.6048, 0.6507, 0.7083, 0.8756, and 0.6578, respectively. BLEU-3 performed with scores of 0.6414, 0.6892, 0.7312, 0.8861, and 0.7307, respectively. Finally, BLEU-4 had scores of 0.6526, 0.6504, 0.7345, 0.8250, and 0.7537, respectively. Our CNN-C model outperformed the other models, especially the baseline model. Furthermore, there are several future challenges in studying captions, such as geological sentence structure, geological sentence phrase, and constructing words by a geological tagger. Full article
Show Figures

Figure 1

12 pages, 4018 KiB  
Article
Face Anti-Spoofing Method Based on Residual Network with Channel Attention Mechanism
by Yueping Kong, Xinyuan Li, Guangye Hao and Chu Liu
Electronics 2022, 11(19), 3056; https://doi.org/10.3390/electronics11193056 - 25 Sep 2022
Cited by 7 | Viewed by 2655
Abstract
The face recognition system is vulnerable to spoofing attacks by photos or videos of a valid user face. However, edge degradation and texture blurring occur when non-living face images are used to attack the face recognition system. With this in mind, a novel [...] Read more.
The face recognition system is vulnerable to spoofing attacks by photos or videos of a valid user face. However, edge degradation and texture blurring occur when non-living face images are used to attack the face recognition system. With this in mind, a novel face anti-spoofing method combines the residual network and the channel attention mechanism. In our method, the residual network extracts the texture differences of features between face images. In contrast, the attention mechanism focuses on the differences of shadow and edge features located on nasal and cheek areas between living and non-living face images. It can assign weights to different filter features of the face image and enhance the ability of network extraction and expression of different key features in the nasal and cheek regions, improving detection accuracy. The experiments were performed on the public face anti-spoofing datasets of Replay-Attack and CASIA-FASD. We found the best value of the parameter r suitable for face anti-spoofing research is 16, and the accuracy of the method is 99.98% and 97.75%, respectively. Furthermore, to enhance the robustness of the method to illumination changes, the experiment was also performed on the datasets with light changes and achieved a good result. Full article
Show Figures

Figure 1

Back to TopTop