Computer Vision and Pattern Recognition with Applications

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (31 March 2023) | Viewed by 31718

Special Issue Editor


E-Mail Website
Guest Editor
School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China
Interests: computer vision; pattern recognition; multimedia computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Computer vision and pattern recognition are fundamental problems in artificial intelligence, which also belongs to the application scopes of mathematical theory and tools. Computer vision enables computers and systems to derive meaningful information from digital images, videos, and other visual inputs and take actions or make recommendations based on that information. Pattern recognition is the process of recognizing patterns using a machine learning algorithm. Recent years have witnessed the rapid expansion of computer vision and pattern recognition, and a wide range of applications based on computer vision and pattern recognition can be seen everywhere, e.g., object detection, recognition, segmentation, classification, content generation, and multimedia analysis. In this Special Issue, we aim to assemble recent advances in computer vision, pattern recognition, and related extended applications.

Prof. Dr. Teng Li
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • pattern classification and clustering
  • machine learning, neural network, and deep learning
  • theory in computer vision and pattern recognition
  • low-level vision, image processing, and machine vision
  • 3D computer vision and reconstruction
  • object detection, tracking, recognition, and action recognition
  • data mining and signal processing
  • multimedia/multimodal analysis and applications
  • biomedical image processing and analysis
  • medical image analysis and applications
  • graph theory and its applications
  • vision analysis and understanding
  • vision for robots and autonomous driving
  • vision applications and systems
  • vision and language

Related Special Issue

Published Papers (18 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 776 KiB  
Article
Redesigning Embedding Layers for Queries, Keys, and Values in Cross-Covariance Image Transformers
by Jaesin Ahn , Jiuk Hong, Jeongwoo Ju and Heechul Jung
Mathematics 2023, 11(8), 1933; https://doi.org/10.3390/math11081933 - 19 Apr 2023
Viewed by 1319
Abstract
There are several attempts in vision transformers to reduce quadratic time complexity to linear time complexity according to increases in the number of tokens. Cross-covariance image transformers (XCiT) are also one of the techniques utilized to address the issue. However, despite these efforts, [...] Read more.
There are several attempts in vision transformers to reduce quadratic time complexity to linear time complexity according to increases in the number of tokens. Cross-covariance image transformers (XCiT) are also one of the techniques utilized to address the issue. However, despite these efforts, the increase in token dimensions still results in quadratic growth in time complexity, and the dimension is a key parameter for achieving superior generalization performance. In this paper, a novel method is proposed to improve the generalization performances of XCiT models without increasing token dimensions. We redesigned the embedding layers of queries, keys, and values, such as separate non-linear embedding (SNE), partially-shared non-linear embedding (P-SNE), and fully-shared non-linear embedding (F-SNE). Finally, a proposed structure with different model size settings achieved 71.4%,77.8%, and 82.1% on ImageNet-1k compared with 69.9%,77.1%, and 82.0% acquired by the original XCiT models, namely XCiT-N12, XCiT-T12, and XCiT-S12, respectively. Additionally, the proposed model achieved 94.8% in transfer learning experiments, on average, for CIFAR-10, CIFAR-100, Stanford Cars, and STL-10, which is superior to the baseline model of XCiT-S12 (94.5%). In particular, the proposed models demonstrated considerable improvements on the out-of-distribution detection task compared to the original XCiT models. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

15 pages, 1396 KiB  
Article
VRR-Net: Learning Vehicle–Road Relationships for Vehicle Trajectory Prediction on Highways
by Tingzhang Zhan, Qieshi Zhang, Guangxi Chen and Jun Cheng
Mathematics 2023, 11(6), 1293; https://doi.org/10.3390/math11061293 - 08 Mar 2023
Cited by 1 | Viewed by 1298
Abstract
Vehicle trajectory prediction is an important decision-making and planning basis for autonomous driving systems that enables them to drive safely and efficiently. To accurately predict vehicle trajectories, the complex representations and dynamic interactions among the elements in a traffic scene are abstracted and [...] Read more.
Vehicle trajectory prediction is an important decision-making and planning basis for autonomous driving systems that enables them to drive safely and efficiently. To accurately predict vehicle trajectories, the complex representations and dynamic interactions among the elements in a traffic scene are abstracted and modelled. This paper presents vehicle–road relationships net, a deep learning network that extracts features from vehicle–road relationships and models the effects of traffic environments on vehicles. The introduction of geographic highway information and the calculation of spatiotemporal distances with a reference not only unify heterogeneous vehicle–road relationship representations into a time series vector but also reduce the requirement for sensing transient changes in the surrounding area. A hierarchical long short-term memory network extracts environmental features from two perspectives—social interactions and road constraints—and predicts the future trajectories of vehicles by their manoeuvre categories. Accordingly, vehicle–road relationships net fully exploits the contributions of historical trajectories and integrates the effects of road constraints to achieve performance that is comparable to or better than that of state-of-the-art methods on the next-generation simulation dataset. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

12 pages, 1431 KiB  
Article
Reconstructing a 3D Medical Image from a Few 2D Projections Using a B-Spline-Based Deformable Transformation
by Hui Yan and Jianrong Dai
Mathematics 2023, 11(1), 69; https://doi.org/10.3390/math11010069 - 25 Dec 2022
Viewed by 1354
Abstract
(1) Background: There was a need for 3D image reconstruction from a series of 2D projections in medical applications. However, additional exposure to X-ray projections may harm human health. To alleviate it, minimizing the projection numbers is a solution to reduce X-ray exposure, [...] Read more.
(1) Background: There was a need for 3D image reconstruction from a series of 2D projections in medical applications. However, additional exposure to X-ray projections may harm human health. To alleviate it, minimizing the projection numbers is a solution to reduce X-ray exposure, but this would cause significant image noise and artifacts. (2) Purpose: In this study, a method was proposed for the reconstruction of a 3D image from a minimal set of 2D X-ray projections using a B-spline-based deformable transformation. (3) Methods: The inputs of this method were a 3D image which was acquired in previous treatment and used as a prior image and a minimal set of 2D projections which were acquired during the current treatment. The goal was to reconstruct a new 3D image in current treatment from the two inputs. The new 3D image was deformed from the prior image via the displacement matrixes that were interpolated by the B-spline coefficients. The B-spline coefficients were solved with the objective function, which was defined as the mean square error between the reconstructed and the ground-truth projections. In the optimization process the gradient of the objective function was calculated, and the B-spline coefficients were then updated. For the acceleration purpose, the computation of the 2D and 3D image reconstructions and B-spline interpolation were implemented on a graphics processing unit (GPU). (4) Results: When the scan angles were more than 60°, the image quality was significantly improved, and the reconstructed image was comparable to that of the ground-truth image. As the scan angles were less than 30°, the image quality was significantly degraded. The influence of the scan orientation on the image quality was minor. With the application of GPU acceleration, the reconstruction efficiency was improved by hundred times compared to that of the conventional CPU. (5) Conclusions: The proposed method was able to generate a high-quality 3D image using a few 2D projections, which amount to ~ 20% of the total projections required for a standard image. The introduction of the B-spline-interpolated displacement matrix was effective in the suppressing noise in the reconstructed image. This method could significantly reduce the imaging time and the radiation exposure of patients under treatment. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

15 pages, 16165 KiB  
Article
Deep Multi-Task Learning for an Autoencoder-Regularized Semantic Segmentation of Fundus Retina Images
by Ge Jin, Xu Chen and Long Ying
Mathematics 2022, 10(24), 4798; https://doi.org/10.3390/math10244798 - 16 Dec 2022
Cited by 2 | Viewed by 1408
Abstract
Automated segmentation of retinal blood vessels is necessary for the diagnosis, monitoring, and treatment planning of the disease. Although current U-shaped structure models have achieved outstanding performance, some challenges still emerge due to the nature of this problem and mainstream models. (1) There [...] Read more.
Automated segmentation of retinal blood vessels is necessary for the diagnosis, monitoring, and treatment planning of the disease. Although current U-shaped structure models have achieved outstanding performance, some challenges still emerge due to the nature of this problem and mainstream models. (1) There does not exist an effective framework to obtain and incorporate features with different spatial and semantic information at multiple levels. (2) The fundus retina images coupled with high-quality blood vessel segmentation are relatively rare. (3) The information on edge regions, which are the most difficult parts to segment, has not received adequate attention. In this work, we propose a novel encoder–decoder architecture based on the multi-task learning paradigm to tackle these challenges. The shared image encoder is regularized by conducting the reconstruction task in the VQ-VAE (Vector Quantized Variational AutoEncoder) module branch to improve the generalization ability. Meanwhile, hierarchical representations are generated and integrated to complement the input image. The edge attention module is designed to make the model capture edge-focused feature representations via deep supervision, focusing on the target edge regions that are most difficult to recognize. Extensive evaluations of three publicly accessible datasets demonstrate that the proposed model outperforms the current state-of-the-art methods. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

17 pages, 3377 KiB  
Article
Semantic Segmentation of UAV Images Based on Transformer Framework with Context Information
by Satyawant Kumar, Abhishek Kumar and Dong-Gyu Lee
Mathematics 2022, 10(24), 4735; https://doi.org/10.3390/math10244735 - 13 Dec 2022
Cited by 5 | Viewed by 2679
Abstract
With the advances in Unmanned Aerial Vehicles (UAVs) technology, aerial images with huge variations in the appearance of objects and complex backgrounds have opened a new direction of work for researchers. The task of semantic segmentation becomes more challenging when capturing inherent features [...] Read more.
With the advances in Unmanned Aerial Vehicles (UAVs) technology, aerial images with huge variations in the appearance of objects and complex backgrounds have opened a new direction of work for researchers. The task of semantic segmentation becomes more challenging when capturing inherent features in the global and local context for UAV images. In this paper, we proposed a transformer-based encoder-decoder architecture to address this issue for the precise segmentation of UAV images. The inherent feature representation of the UAV images is exploited in the encoder network using a self-attention-based transformer framework to capture long-range global contextual information. A Token Spatial Information Fusion (TSIF) module is proposed to take advantage of a convolution mechanism that can capture local details. It fuses the local contextual details about the neighboring pixels with the encoder network and makes semantically rich feature representations. We proposed a decoder network that processes the output of the encoder network for the final semantic level prediction of each pixel. We demonstrate the effectiveness of this architecture on UAVid and Urban Drone datasets, where we achieved mIoU of 61.93% and 73.65%, respectively. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

19 pages, 8565 KiB  
Article
Learning Adaptive Spatial Regularization and Temporal-Aware Correlation Filters for Visual Object Tracking
by Liqiang Liu, Tiantian Feng, Yanfang Fu, Chao Shen, Zhijuan Hu, Maoyuan Qin, Xiaojun Bai and Shifeng Zhao
Mathematics 2022, 10(22), 4320; https://doi.org/10.3390/math10224320 - 17 Nov 2022
Viewed by 931
Abstract
Recently, discriminative correlation filters (DCF) based trackers have gained much attention and obtained remarkable achievements for their high efficiency and outstanding performance. However, undesirable boundary effects occur when the DCF-based trackers suffer from challenging situations, such as occlusion, background clutters, fast motion, and [...] Read more.
Recently, discriminative correlation filters (DCF) based trackers have gained much attention and obtained remarkable achievements for their high efficiency and outstanding performance. However, undesirable boundary effects occur when the DCF-based trackers suffer from challenging situations, such as occlusion, background clutters, fast motion, and so on. To address these problems, this work proposes a novel adaptive spatial regularization and temporal-aware correlation filters (ASTCF) model to deal with the boundary effects which occur in the correlation filters tracking. Firstly, our ASTCF model learns a more robust correlation filter template by introducing spatial regularization and temporal-aware components into the objective function. The adaptive spatial regularization provides a more robust appearance model to handle the large appearance changes at different times; meanwhile, the temporal-aware constraint can enhance the time continuity and consistency of this model. They make correlation filters model more discriminating, and also reduce the influence of the boundary effects during the tracking process. Secondly, the objective function can be transformed into three sub-problems with closed-form solutions and effectively solved via the alternating direction method of multipliers (ADMM). Finally, we compare our tracker with some representative methods and evaluate using three different benchmarks, including OTB2015, VOT2018 and LaSOT datasets, where the experimental results demonstrate the superiority of our tracker on most of the performance criteria compared with the existing trackers. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

18 pages, 3842 KiB  
Article
Deep Learning-Based Plant Classification Using Nonaligned Thermal and Visible Light Images
by Ganbayar Batchuluun, Se Hyun Nam and Kang Ryoung Park
Mathematics 2022, 10(21), 4053; https://doi.org/10.3390/math10214053 - 01 Nov 2022
Cited by 2 | Viewed by 1690
Abstract
There have been various studies conducted on plant images. Machine learning algorithms are usually used in visible light image-based studies, whereas, in thermal image-based studies, acquired thermal images tend to be analyzed with a naked eye visual examination. However, visible light cameras are [...] Read more.
There have been various studies conducted on plant images. Machine learning algorithms are usually used in visible light image-based studies, whereas, in thermal image-based studies, acquired thermal images tend to be analyzed with a naked eye visual examination. However, visible light cameras are sensitive to light, and cannot be used in environments with low illumination. Although thermal cameras are not susceptible to these drawbacks, they are sensitive to atmospheric temperature and humidity. Moreover, in previous thermal camera-based studies, time-consuming manual analyses were performed. Therefore, in this study, we conducted a novel study by simultaneously using thermal images and corresponding visible light images of plants to solve these problems. The proposed network extracted features from each thermal image and corresponding visible light image of plants through residual block-based branch networks, and combined the features to increase the accuracy of the multiclass classification. Additionally, a new database was built in this study by acquiring thermal images and corresponding visible light images of various plants. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

15 pages, 1430 KiB  
Article
Interactive Learning of a Dual Convolution Neural Network for Multi-Modal Action Recognition
by Qingxia Li, Dali Gao, Qieshi Zhang, Wenhong Wei and Ziliang Ren
Mathematics 2022, 10(21), 3923; https://doi.org/10.3390/math10213923 - 22 Oct 2022
Viewed by 1197
Abstract
RGB and depth modalities contain more abundant and interactive information, and convolutional neural networks (ConvNets) based on multi-modal data have achieved successful progress in action recognition. Due to the limitation of a single stream, it is difficult to improve recognition performance by learning [...] Read more.
RGB and depth modalities contain more abundant and interactive information, and convolutional neural networks (ConvNets) based on multi-modal data have achieved successful progress in action recognition. Due to the limitation of a single stream, it is difficult to improve recognition performance by learning multi-modal interactive features. Inspired by the multi-stream learning mechanism and spatial-temporal information representation methods, we construct dynamic images by using the rank pooling method and design an interactive learning dual-ConvNet (ILD-ConvNet) with a multiplexer module to improve action recognition performance. Built on the rank pooling method, the constructed visual dynamic images can capture the spatial-temporal information from entire RGB videos. We extend this method to depth sequences to obtain more abundant multi-modal spatial-temporal information as the inputs of the ConvNets. In addition, we design a dual ILD-ConvNet with multiplexer modules to jointly learn the interactive features of two-stream from RGB and depth modalities. The proposed recognition framework has been tested on two benchmark multi-modal datasets—NTU RGB + D 120 and PKU-MMD. The proposed ILD-ConvNet with a temporal segmentation mechanism achieves an accuracy of 86.9% and 89.4% for Cross-Subject (C-Sub) and Cross-Setup (C-Set) on NTU RGB + D 120, 92.0% and 93.1% for Cross-Subject (C-Sub) and Cross-View (C-View) on PKU-MMD, which are comparable with the state of the art. The experimental results shown that our proposed ILD-ConvNet with a multiplexer module can extract interactive features from different modalities to enhance action recognition performance. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

22 pages, 6138 KiB  
Article
Deep-Learning-Based Complex Scene Text Detection Algorithm for Architectural Images
by Weiwei Sun, Huiqian Wang, Yi Lu, Jiasai Luo, Ting Liu, Jinzhao Lin, Yu Pang and Guo Zhang
Mathematics 2022, 10(20), 3914; https://doi.org/10.3390/math10203914 - 21 Oct 2022
Cited by 3 | Viewed by 1994
Abstract
With the advent of smart cities, the text information in an image can be accurately located and recognized, and then applied to the fields of instant translation, image retrieval, card surface information recognition, and license plate recognition. Thus, people’s lives and work will [...] Read more.
With the advent of smart cities, the text information in an image can be accurately located and recognized, and then applied to the fields of instant translation, image retrieval, card surface information recognition, and license plate recognition. Thus, people’s lives and work will become more convenient and comfortable. Owing to the varied orientations, angles, and shapes of text, identifying textual features from images is challenging. Therefore, we propose an improved EAST detector algorithm for detecting and recognizing slanted text in images. The proposed algorithm uses reinforcement learning to train a recurrent neural network controller. The optimal fully convolutional neural network structure is selected, and multi-scale features of text are extracted. After importing this information into the output module, the Generalized Intersection over Union algorithm is used to enhance the regression effect of the text bounding box. Next, the loss function is adjusted to ensure a balance between positive and negative sample classes before outputting the improved text detection results. Experimental results indicate that the proposed algorithm can address the problem of category homogenization and improve the low recall rate in target detection. When compared with other image detection algorithms, the proposed algorithm can better identify slanted text in natural scene images. Finally, its ability to recognize text in complex environments is also excellent. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

30 pages, 5193 KiB  
Article
Ocular Biometrics with Low-Resolution Images Based on Ocular Super-Resolution CycleGAN
by Young Won Lee, Jung Soo Kim and Kang Ryoung Park
Mathematics 2022, 10(20), 3818; https://doi.org/10.3390/math10203818 - 16 Oct 2022
Cited by 3 | Viewed by 1214
Abstract
Iris recognition, which is known to have outstanding performance among conventional biometrics techniques, requires a high-resolution camera and a sufficient amount of lighting to capture images containing various iris patterns. To address these issues, research is actively conducted on ocular recognition to include [...] Read more.
Iris recognition, which is known to have outstanding performance among conventional biometrics techniques, requires a high-resolution camera and a sufficient amount of lighting to capture images containing various iris patterns. To address these issues, research is actively conducted on ocular recognition to include a periocular region in addition to the iris region, which also requires a high-resolution camera to capture images, indicating limited applications due to costs and size limitation. Accordingly, this study proposes an ocular super-resolution cycle-consistent generative adversarial network (OSRCycleGAN) for ocular super-resolution reconstruction, and additionally proposes a method to improve recognition performance in case that ocular images are acquired at a low-resolution. The results of the experiment conducted using open databases, namely, CASIA-iris-Distance and Lamp v4, and IIT Delhi iris database, showed that the equal error rate of recognition of the proposed method was 3.02%, 4.06% and 2.13% for each database, respectively, which outperformed state-of-the-art methods. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

17 pages, 2016 KiB  
Article
Deep Spatial-Temporal Neural Network for Dense Non-Rigid Structure from Motion
by Yaming Wang, Minjie Wang, Wenqing Huang, Xiaoping Ye and Mingfeng Jiang
Mathematics 2022, 10(20), 3794; https://doi.org/10.3390/math10203794 - 14 Oct 2022
Cited by 1 | Viewed by 1177
Abstract
Dense non-rigid structure from motion (NRSfM) has long been a challenge in computer vision because of the vast number of feature points. As neural networks develop rapidly, a novel solution is emerging. However, existing methods ignore the significance of spatial–temporal data and the [...] Read more.
Dense non-rigid structure from motion (NRSfM) has long been a challenge in computer vision because of the vast number of feature points. As neural networks develop rapidly, a novel solution is emerging. However, existing methods ignore the significance of spatial–temporal data and the strong capacity of neural networks for learning. This study proposes a deep spatial–temporal NRSfM framework (DST-NRSfM) and introduces a weighted spatial constraint to further optimize the 3D reconstruction results. Layer normalization layers are applied in dense NRSfM tasks to stop gradient disappearance and hasten neural network convergence. Our DST-NRSfM framework outperforms both classical approaches and recent advancements. It achieves state-of-the-art performance across commonly used synthetic and real benchmark datasets. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

19 pages, 3094 KiB  
Article
Multi-Level Cross-Modal Semantic Alignment Network for Video–Text Retrieval
by Fudong Nian, Ling Ding, Yuxia Hu and Yanhong Gu
Mathematics 2022, 10(18), 3346; https://doi.org/10.3390/math10183346 - 15 Sep 2022
Cited by 1 | Viewed by 1396
Abstract
This paper strives to improve the performance of video–text retrieval. To date, many algorithms have been proposed to facilitate the similarity measure of video–text retrieval from the single global semantic to multi-level semantics. However, these methods may suffer from the following limitations: (1) [...] Read more.
This paper strives to improve the performance of video–text retrieval. To date, many algorithms have been proposed to facilitate the similarity measure of video–text retrieval from the single global semantic to multi-level semantics. However, these methods may suffer from the following limitations: (1) largely ignore the relationship semantic which results in semantic levels are insufficient; (2) it is incomplete to constrain the real-valued features of different modalities to be in the same space only through the feature distance measurement; (3) fail to handle the problem that the distributions of attribute labels in different semantic levels are heavily imbalanced. To overcome the above limitations, this paper proposes a novel multi-level cross-modal semantic alignment network (MCSAN) for video–text retrieval by jointly modeling video–text similarity on global, entity, action and relationship semantic levels in a unified deep model. Specifically, both video and text are first decomposed into global, entity, action and relationship semantic levels by carefully designing spatial–temporal semantic learning structures. Then, we utilize KLDivLoss and a cross-modal parameter-share attribute projection layer as statistical constraints to ensure that representations from different modalities in different semantic levels are projected into a common semantic space. In addition, a novel focal binary cross-entropy (FBCE) loss function is presented, which is the first effort to model the unbalanced attribute distribution problem for video–text retrieval. MCSAN is practically effective to take the advantage of the complementary information among four semantic levels. Extensive experiments on two challenging video–text retrieval datasets, namely, MSR-VTT and VATEX, show the viability of our method. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

31 pages, 3633 KiB  
Article
Boosting Unsupervised Dorsal Hand Vein Segmentation with U-Net Variants
by Szidónia Lefkovits, Simina Emerich and László Lefkovits
Mathematics 2022, 10(15), 2620; https://doi.org/10.3390/math10152620 - 27 Jul 2022
Cited by 2 | Viewed by 1506
Abstract
The identification of vascular network structures is one of the key fields of research in medical imaging. The segmentation of dorsal hand vein patterns form NIR images is not only the basis for reliable biometric identification, but would also provide a significant tool [...] Read more.
The identification of vascular network structures is one of the key fields of research in medical imaging. The segmentation of dorsal hand vein patterns form NIR images is not only the basis for reliable biometric identification, but would also provide a significant tool in assisting medical intervention. Precise vein extraction would help medical workers to exactly determine the needle entry point to efficiently gain intravenous access for different clinical purposes, such as intravenous therapy, parenteral nutrition, blood analysis and so on. It would also eliminate repeated attempts at needle pricks and even facilitate an automatic injection procedure in the near future. In this paper, we present a combination of unsupervised and supervised dorsal hand vein segmentation from near-infrared images in the NCUT database. This method is convenient due to the lack of expert annotations of publicly available vein image databases. The novelty of our work is the automatic extraction of the veins in two phases. First, a geometrical approach identifies tubular structures corresponding to veins in the image. This step is considered gross segmentation and provides labels (Label I) for the second CNN-based segmentation phase. We visually observe that different CNNs obtain better segmentation on the test set. This is the reason for building an ensemble segmentor based on majority voting by nine different network architectures (U-Net, U-Net++ and U-Net3+, all trained with BCE, Dice and focal losses). The segmentation result of the ensemble is considered the second label (Label II). In our opinion, the new Label II is a better annotation of the NCUT database than the Label I obtained in the first step. The efficiency of computer vision algorithms based on artificial intelligence algorithms is determined by the quality and quantity of the labeled data used. Furthermore, we prove this statement by training ResNet–UNet in the same manner with the two different label sets. In our experiments, the Dice scores, sensitivity and specificity with ResNet–UNet trained on Label II are superior to the same classifier trained on Label I. The measured Dice scores of ResNet–UNet on the test set increase from 90.65% to 95.11%. It is worth mentioning that this article is one of very few in the domain of dorsal hand vein segmentation; moreover, it presents a general pipeline that may be applied for different medical image segmentation purposes. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

18 pages, 2575 KiB  
Article
TransMF: Transformer-Based Multi-Scale Fusion Model for Crack Detection
by Xiaochen Ju, Xinxin Zhao and Shengsheng Qian
Mathematics 2022, 10(13), 2354; https://doi.org/10.3390/math10132354 - 05 Jul 2022
Cited by 8 | Viewed by 2097
Abstract
Cracks are widespread in infrastructure that are closely related to human activity. It is very popular to use artificial intelligence to detect cracks intelligently, which is known as crack detection. The noise in the background of crack images, discontinuity of cracks and other [...] Read more.
Cracks are widespread in infrastructure that are closely related to human activity. It is very popular to use artificial intelligence to detect cracks intelligently, which is known as crack detection. The noise in the background of crack images, discontinuity of cracks and other problems make the crack detection task a huge challenge. Although many approaches have been proposed, there are still two challenges: (1) cracks are long and complex in shape, making it difficult to capture long-range continuity; (2) most of the images in the crack dataset have noise, and it is difficult to detect only the cracks and ignore the noise. In this paper, we propose a novel method called Transformer-based Multi-scale Fusion Model (TransMF) for crack detection, including an Encoder Module (EM), Decoder Module (DM) and Fusion Module (FM). The Encoder Module uses a hybrid of convolution blocks and Swin Transformer block to model the long-range dependencies of different parts in a crack image from a local and global perspective. The Decoder Module is designed with symmetrical structure to the Encoder Module. In the Fusion Module, the output in each layer with unique scales of Encoder Module and Decoder Module are fused in the form of convolution, which can release the effect of background noise and strengthen the correlations between relevant context in order to enhance the crack detection. Finally, the output of each layer of the Fusion Module is concatenated to achieve the purpose of crack detection. Extensive experiments on three benchmark datasets (CrackLS315, CRKWH100 and DeepCrack) demonstrate that the proposed TransMF in this paper exceeds the best performance of present baselines. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

17 pages, 680 KiB  
Article
Multimedia Applications Processing and Computation Resource Allocation in MEC-Assisted SIoT Systems with DVS
by Xianwei Li, Guolong Chen, Liang Zhao and Bo Wei
Mathematics 2022, 10(9), 1593; https://doi.org/10.3390/math10091593 - 07 May 2022
Cited by 1 | Viewed by 1310
Abstract
Due to the advancements of information technologies and the Internet of Things (IoT), the number of distributed sensors and IoT devices in the social IoT (SIoT) systems is proliferating. This has led to various multimedia applications, face recognition and augmented reality (AR). These [...] Read more.
Due to the advancements of information technologies and the Internet of Things (IoT), the number of distributed sensors and IoT devices in the social IoT (SIoT) systems is proliferating. This has led to various multimedia applications, face recognition and augmented reality (AR). These applications are computation-intensive and delay-sensitive and have become popular in our daily life. However, IoT devices are well-known for their constrained computational resources, which hinders the execution of these applications. Mobile edge computing (MEC) has appeared and been deemed a prospective paradigm to solve this issue. Migrating the applications of IoT devices to be executed in the edge cloud can not only provide computational resources to process these applications but also lower the transmission latency between the IoT devices and the edge cloud. In this paper, computation resource allocation and multimedia applications offloading in MEC-assisted SIoT systems are investigated. We aim to optimize the resource allocation and application offloading by jointly minimizing the execution latency of multimedia applications and the consumed energy of IoT devices. The studied problem is a formulation of the total computation overhead minimization problem by optimizing the computational resources in the edge servers. Besides, as the technology of dynamic voltage scaling (DVS) can offer more flexibility for the MEC system design, we incorporate it into the application offloading. Since the studied problem is a mixed-integer nonlinear programming (MINP) problem, an efficient method is proposed to address it. By comparing with the baseline schemes, the theoretic analysis and simulation results demonstrate that the proposed multimedia applications offloading method can improve the performances of MEC-assisted SIoT systems for the most part. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

13 pages, 1609 KiB  
Article
A Novel Method of Chinese Herbal Medicine Classification Based on Mutual Learning
by Meng Han, Jilin Zhang, Yan Zeng, Fei Hao and Yongjian Ren
Mathematics 2022, 10(9), 1557; https://doi.org/10.3390/math10091557 - 05 May 2022
Cited by 2 | Viewed by 3231
Abstract
Chinese herbal medicine classification is an important research task in intelligent medicine, which has been applied widely in the fields of smart medicinal material sorting and medicinal material recommendation. However, most current mainstream methods are semi-automatic, with low efficiency and poor performance. To [...] Read more.
Chinese herbal medicine classification is an important research task in intelligent medicine, which has been applied widely in the fields of smart medicinal material sorting and medicinal material recommendation. However, most current mainstream methods are semi-automatic, with low efficiency and poor performance. To tackle this problem, a novel Chinese herbal medicine classification method based on mutual learning has been proposed. Specifically, two small student networks are designed for collaborative learning, and each of them collects knowledge learned from the other one respectively. Consequently, student networks obtain rich and reliable features, which will further improve the performance of Chinese herbal medicinal classification. In order to validate the performance of the proposed model, a dataset with 100 Chinese herbal classes (about 10,000 samples) was utilized and extensive experiments were performed. Experimental results verify that the proposed method is superior to those of the latest models with equivalent or even fewer parameters, specifically, obtaining 3∼5.4% higher accuracy rate and 13∼37% lower loss. Moreover, the mutual learning model achieves 80.8% Chinese herbal medicine classification accuracy. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

17 pages, 4168 KiB  
Article
A Medical Endoscope Image Enhancement Method Based on Improved Weighted Guided Filtering
by Guo Zhang, Jinzhao Lin, Enling Cao, Yu Pang and Weiwei Sun
Mathematics 2022, 10(9), 1423; https://doi.org/10.3390/math10091423 - 23 Apr 2022
Cited by 9 | Viewed by 2335
Abstract
In clinical surgery, the quality of endoscopic images is degraded by noise. Blood, illumination changes, specular reflection, smoke, and other factors contribute to noise, which reduces the quality of an image in an occluded area, affects doctors’ judgment, prolongs the operation duration, and [...] Read more.
In clinical surgery, the quality of endoscopic images is degraded by noise. Blood, illumination changes, specular reflection, smoke, and other factors contribute to noise, which reduces the quality of an image in an occluded area, affects doctors’ judgment, prolongs the operation duration, and increases the operation risk. In this study, we proposed an improved weighted guided filtering algorithm to enhance endoscopic image tissue. An unsharp mask algorithm and an improved weighted guided filter were used to enhance vessel details and contours in endoscopic images. The scheme of the entire endoscopic image processing, which included detail enhancement, contrast enhancement, brightness enhancement, and highlight area removal, is presented. Compared with other algorithms, the proposed algorithm maintained edges and reduced halos efficiently, and its effectiveness was demonstrated using experiments. The peak signal-to-noise ratio and structural similarity of endoscopic images obtained using the proposed algorithm were the highest. The foreground–background detail variance–background variance improved. The proposed algorithm had a strong ability to suppress noise and could maintain the structure of original endoscopic images, which improved the details of tissue blood vessels. The findings of this study can provide guidelines for developing endoscopy devices. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

17 pages, 5410 KiB  
Article
Scale and Background Aware Asymmetric Bilateral Network for Unconstrained Image Crowd Counting
by Gang Lv, Yushan Xu, Zuchang Ma, Yining Sun and Fudong Nian
Mathematics 2022, 10(7), 1053; https://doi.org/10.3390/math10071053 - 25 Mar 2022
Viewed by 1650
Abstract
This paper attacks the two challenging problems of image-based crowd counting, that is, scale variation and complex background. To that end, we present a novel crowd counting method, called the Scale and Background aware Asymmetric Bilateral Network (SBAB-Net), which is able to handle [...] Read more.
This paper attacks the two challenging problems of image-based crowd counting, that is, scale variation and complex background. To that end, we present a novel crowd counting method, called the Scale and Background aware Asymmetric Bilateral Network (SBAB-Net), which is able to handle scale variation and background noise in a unified framework. Specifically, the proposed SBAB-Net contains three main components, a pre-trained backbone convolutional neural network (CNN) as the feature extractor and two asymmetric branches to generate a density map. These two asymmetric branches have different structures and use features from different semantic layers. One branch is densely connected stacked dilated convolution (DCSDC) sub-network with different dilation rates, which relies on one deep feature layer and can handle scale variation. The other branch is parameter-free densely connected stacked pooling (DCSP) sub-network with various pooling kernels and strides, which relies on shallow feature and can fuse features with several receptive fields to reduce the impact of background noise. Two sub-networks are fused by the attention mechanism to generate the final density map. Extensive experimental results on three widely-used benchmark datasets have demonstrated the effectiveness and superiority of our proposed method: (1) We achieve competitive counting performance compared to state-of-the-art methods; (2) Compared with baseline, the MAE and MSE are decreased by at least 6.3% and 11.3%, respectively. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

Back to TopTop