Special Issue "AI-Based Image Processing"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 30 April 2023 | Viewed by 10512

Special Issue Editors

Prof. Dr. Xinwei Yao
E-Mail Website
Guest Editor
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
Interests: artificial intelligence; Internet of Things; robots
Dr. Yougang Sun
E-Mail Website
Guest Editor
1. Institute of Rail Transit, Tongji University, Shanghai 201804, China
2. The National Maglev Transportation Engineering R&D Center, Tongji University, Shanghai 201804, China
Interests: maglev vehicle dynamics; nonlinear control method; multibody dynamics modeling; vehicle-rail coupling vibration; fuzzy logic control; reinforcement learning control
Special Issues, Collections and Topics in MDPI journals
Prof. Dr. Xiaogang Jin
E-Mail Website
Guest Editor
State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, China
Interests: GAN-based facial editing; deep cloth animation; deep portrait editing; special effects simulation for motion picture; cloth animation and virtual try-on; traffic simulation and autonomous driving; crowd and group animation; implicit surface modeling and applications; creative modeling; sketch-based modeling; physically-based animation; image processing

Special Issue Information

Dear Colleagues,

With the rapid development of artificial intelligence, a series of artificial intelligence algorithms have been proposed, and intelligent applications have started to play an increasingly important role in industrial production and our social lives. The intersections of AI in medical image processing, image processing for intelligent transportation systems, satellite image processing, face recognition, object recognition, etc. are still challenging.

Therefore, a Special Issue on “AI-based Image Processing”, focusing on tackling the most pressing problems faced by the world is organized, and more specifically, in the topics specified below:

  • Image processing algorithms;
  • Image analytics;
  • Medial image processing;
  • Biomedical image analysis;
  • Image generation;
  • Image restoration and enhancement;
  • Image compression;
  • Edge detection;
  • Image segmentation;
  • Semantic segmentation;
  • Image classification;
  • Image inpainting;
  • Image captioning;
  • Feature detection and extraction;
  • Content-based image retrieval;
  • Optical character recognition;
  • Face recognition;
  • Emotion recognition;
  • Gesture recognition;
  • Object recognition and tracking.

Prof. Dr. Xinwei Yao
Dr. Yougang Sun
Prof. Dr. Xiaogang Jin
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2300 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • feature detection and extraction
  • object recognition and tracking

Published Papers (21 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

Article
Imitative Reinforcement Learning Fusing Mask R-CNN Perception Algorithms
Appl. Sci. 2022, 12(22), 11821; https://doi.org/10.3390/app122211821 - 21 Nov 2022
Viewed by 230
Abstract
Autonomous urban driving navigation is still an open problem and has ample room for improvement in unknown complex environments. This paper proposes an end-to-end autonomous driving approach that combines Conditional Imitation Learning (CIL), Mask R-CNN with DDPG. In the first stage, data acquisition [...] Read more.
Autonomous urban driving navigation is still an open problem and has ample room for improvement in unknown complex environments. This paper proposes an end-to-end autonomous driving approach that combines Conditional Imitation Learning (CIL), Mask R-CNN with DDPG. In the first stage, data acquisition is first performed by using CARLA, a high-fidelity simulation software. Data collected by CARLA is used to train the Mask R-CNN network, which is used for object detection and segmentation. The segmented images are transformed into the backbone of CIL to perform supervised Imitation Learning (IL). DDPG means using Reinforcement Learning for further training in the second stage, which shares the learned weights from the pre-trained CIL model. The combination of the two methods is an innovative way of considering. The benefit is that it is possible to speed up training considerably and obtain super-high levels of performance beyond humans. We conduct experiments on the CARLA driving benchmark of urban driving. In the final experiments, our algorithm outperforms the original MP by 30%, CIL by 33%, and CIRL by 10% in the most difficult tasks, dynamic navigation tasks, and in new environments and new weather, demonstrating that the two-stage framework proposed in this paper shows remarkable generalization capability in unknown environments on navigation tasks. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
ViT-Cap: A Novel Vision Transformer-Based Capsule Network Model for Finger Vein Recognition
Appl. Sci. 2022, 12(20), 10364; https://doi.org/10.3390/app122010364 - 14 Oct 2022
Viewed by 274
Abstract
Finger vein recognition has been widely studied due to its advantages, such as high security, convenience, and living body recognition. At present, the performance of the most advanced finger vein recognition methods largely depends on the quality of finger vein images. However, when [...] Read more.
Finger vein recognition has been widely studied due to its advantages, such as high security, convenience, and living body recognition. At present, the performance of the most advanced finger vein recognition methods largely depends on the quality of finger vein images. However, when collecting finger vein images, due to the possible deviation of finger position, ambient lighting and other factors, the quality of the captured images is often relatively low, which directly affects the performance of finger vein recognition. In this study, we proposed a new model for finger vein recognition that combined the vision transformer architecture with the capsule network (ViT-Cap). The model can explore finger vein image information based on global and local attention and selectively focus on the important finger vein feature information. First, we split-finger vein images into patches and then linearly embedded each of the patches. Second, the resulting vector sequence was fed into a transformer encoder to extract the finger vein features. Third, the feature vectors generated by the vision transformer module were fed into the capsule module for further training. We tested the proposed method on four publicly available finger vein databases. Experimental results showed that the average recognition accuracy of the algorithm based on the proposed model was above 96%, which was better than the original vision transformer, capsule network, and other advanced finger vein recognition algorithms. Moreover, the equal error rate (EER) of our model achieved state-of-the-art performance, especially reaching less than 0.3% under the test of FV-USM datasets which proved the effectiveness and reliability of the proposed model in finger vein recognition. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Nighttime Image Dehazing Based on Point Light Sources
Appl. Sci. 2022, 12(20), 10222; https://doi.org/10.3390/app122010222 - 11 Oct 2022
Viewed by 378
Abstract
Images routinely suffer from quality degradation in fog, mist, and other harsh weather conditions. Consequently, image dehazing is an essential and inevitable pre-processing step in computer vision tasks. Image quality enhancement for special scenes, especially nighttime image dehazing is extremely well studied for [...] Read more.
Images routinely suffer from quality degradation in fog, mist, and other harsh weather conditions. Consequently, image dehazing is an essential and inevitable pre-processing step in computer vision tasks. Image quality enhancement for special scenes, especially nighttime image dehazing is extremely well studied for unmanned driving and nighttime surveillance, while the vast majority of dehazing algorithms in the past were only applicable to daytime conditions. After observing a large number of nighttime images, artificial light sources have replaced the position of the sun in daytime images and the impact of light sources on pixels varies with distance. This paper proposed a novel nighttime dehazing method using the light source influence matrix. The luminosity map can well express the photometric difference value of the picture light source. Then, the light source influence matrix is calculated to divide the image into near light source region and non-near light source region. Using the result of two regions, the two initial transmittances obtained by dark channel prior are fused by edge-preserving filtering. For the atmospheric light term, the initial atmospheric light value is corrected by the light source influence matrix. Finally, the final result is obtained by substituting the atmospheric light model. Theoretical analysis and comparative experiments verify the performance of the proposed image dehazing method. In terms of PSNR, SSIM, and UQI, this method improves 9.4%, 11.2%, and 3.3% over the existed night-time defogging method OSPF. In the future, we will explore the work from static picture dehazing to real-time video stream dehazing detection and will be used in detection on potential applications. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Image-Caption Model Based on Fusion Feature
Appl. Sci. 2022, 12(19), 9861; https://doi.org/10.3390/app12199861 - 30 Sep 2022
Viewed by 357
Abstract
The encoder–decoder framework is the main frame of image captioning. The convolutional neural network (CNN) is usually used to extract grid-level features of the image, and the graph convolutional neural network (GCN) is used to extract the image’s region-level features. Grid-level features are [...] Read more.
The encoder–decoder framework is the main frame of image captioning. The convolutional neural network (CNN) is usually used to extract grid-level features of the image, and the graph convolutional neural network (GCN) is used to extract the image’s region-level features. Grid-level features are poor in semantic information, such as the relationship and location of objects, while regional features lack fine-grained information about images. To address this problem, this paper proposes a fusion-features-based image-captioning model, which includes the fusion feature encoder and LSTM decoder. The fusion-feature encoder is divided into grid-level feature encoder and region-level feature encoder. The grid-level feature encoder is a convoluted neural network embedded in squeeze and excitation operations so that the model can focus on features that are highly correlated to the title. The region-level encoder employs node-embedding matrices to enable models to understand different node types and gain richer semantics. Then the features are weighted together by an attention mechanism to guide the decoder LSTM to generate an image caption. Our model was trained and tested in the MS COCO2014 dataset with the experimental evaluation standard Bleu-4 score and CIDEr score of 0.399 and 1.311, respectively. The experimental results indicate that the model can describe the image in detail. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Adaptive Hybrid Storage Format for Sparse Matrix–Vector Multiplication on Multi-Core SIMD CPUs
Appl. Sci. 2022, 12(19), 9812; https://doi.org/10.3390/app12199812 - 29 Sep 2022
Viewed by 251
Abstract
Optimizing sparse matrix–vector multiplication (SpMV) is challenging due to the non-uniform distribution of the non-zero elements of the sparse matrix. The best-performing SpMV format changes depending on the input matrix and the underlying architecture, and there is no “one-size-fit-for-all” format. A hybrid scheme [...] Read more.
Optimizing sparse matrix–vector multiplication (SpMV) is challenging due to the non-uniform distribution of the non-zero elements of the sparse matrix. The best-performing SpMV format changes depending on the input matrix and the underlying architecture, and there is no “one-size-fit-for-all” format. A hybrid scheme combining multiple SpMV storage formats allows one to choose an appropriate format to use for the target matrix and hardware. However, existing hybrid approaches are inadequate for utilizing the SIMD cores of modern multi-core CPUs with SIMDs, and it remains unclear how to best mix different SpMV formats for a given matrix. This paper presents a new hybrid storage format for sparse matrices, specifically targeting multi-core CPUs with SIMDs. Our approach partitions the target sparse matrix into two segmentations based on the regularities of the memory access pattern, where each segmentation is stored in a format suitable for its memory access patterns. Unlike prior hybrid storage schemes that rely on the user to determine the data partition among storage formats, we employ machine learning to build a predictive model to automatically determine the partition threshold on a per matrix basis. Our predictive model is first trained off line, and the trained model can be applied to any new, unseen sparse matrix. We apply our approach to 956 matrices and evaluate its performance on three distinct multi-core CPU platforms: a 72-core Intel Knights Landing (KNL) CPU, a 128-core AMD EPYC CPU, and a 64-core Phytium ARMv8 CPU. Experimental results show that our hybrid scheme, combined with the predictive model, outperforms the best-performing alternative by 2.9%, 17.5% and 16% on average on KNL, AMD, and Phytium, respectively. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Research on CNN-BiLSTM Fall Detection Algorithm Based on Improved Attention Mechanism
Appl. Sci. 2022, 12(19), 9671; https://doi.org/10.3390/app12199671 - 26 Sep 2022
Viewed by 321
Abstract
Falls are one of the significant causes of accidental injuries to the elderly. With the rapid growth of the elderly population, fall detection has become a critical issue in the medical and healthcare fields. In this paper, we propose a model based on [...] Read more.
Falls are one of the significant causes of accidental injuries to the elderly. With the rapid growth of the elderly population, fall detection has become a critical issue in the medical and healthcare fields. In this paper, we propose a model based on an improved attention mechanism, CBAM-IAM-CNN-BiLSTM, to detect falls of the elderly accurately and in time. The model includes a convolution layer, bidirectional LSTM layer, sampling layer and dense layer, and incorporates the improved convolutional attention block module (CBAM) into the network structure so that the one-dimensional convolution layer replaces the dense layer to aggregate the information from channels, which allows the model to accurately extract different behavior characteristics. The acceleration and angular velocity data of the human body, collected by wearable sensors, are respectively input into the convolution layer and bidirectional LSTM layer of the model and then classified and identified by softmax after feature fusion. Based on comparison with models such as CNN and CNN-BiLSTM, as well as with different attention mechanisms such as squeeze-and-excitation (SE), efficient channel attention (ECA) and the convolutional block attention module (CBAM), this model improves the accuracy, sensitivity and specificity to varying degrees. The experimental results showed that the accuracy, sensitivity and specificity of the CBAM-IAM-CNN-BiLSTM model proposed in this paper were 97.37%, 97.29% and 99.56%, respectively, which proves that the model has good practicability and strong generalization ability. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
High-Precision Depth Map Estimation from Missing Viewpoints for 360-Degree Digital Holography
Appl. Sci. 2022, 12(19), 9432; https://doi.org/10.3390/app12199432 - 20 Sep 2022
Viewed by 383
Abstract
In this paper, we propose a novel model to extract highly precise depth maps from missing viewpoints, especially for generating holographic 3D content. These depth maps are essential elements for phase extraction, which is required for the synthesis of computer-generated holograms (CGHs). The [...] Read more.
In this paper, we propose a novel model to extract highly precise depth maps from missing viewpoints, especially for generating holographic 3D content. These depth maps are essential elements for phase extraction, which is required for the synthesis of computer-generated holograms (CGHs). The proposed model, called the holographic dense depth, estimates depth maps through feature extraction, combining up-sampling. We designed and prepared a total of 9832 multi-view images with resolutions of 640 × 360. We evaluated our model by comparing the estimated depth maps with their ground truths using various metrics. We further compared the CGH patterns created from estimated depth maps with those from ground truths and reconstructed the holographic 3D image scenes from their CGHs. Both quantitative and qualitative results demonstrate the effectiveness of the proposed method. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
AdaCB: An Adaptive Gradient Method with Convergence Range Bound of Learning Rate
Appl. Sci. 2022, 12(18), 9389; https://doi.org/10.3390/app12189389 - 19 Sep 2022
Viewed by 414
Abstract
Adaptive gradient descent methods such as Adam, RMSprop, and AdaGrad achieve great success in training deep learning models. These methods adaptively change the learning rates, resulting in a faster convergence speed. Recent studies have shown their problems include extreme learning rates, non-convergence issues, [...] Read more.
Adaptive gradient descent methods such as Adam, RMSprop, and AdaGrad achieve great success in training deep learning models. These methods adaptively change the learning rates, resulting in a faster convergence speed. Recent studies have shown their problems include extreme learning rates, non-convergence issues, as well as poor generalization. Some enhanced variants have been proposed, such as AMSGrad, and AdaBound. However, the performances of these alternatives are controversial and some drawbacks still occur. In this work, we proposed an optimizer called AdaCB, which limits the learning rates of Adam in a convergence range bound. The bound range is determined by the LR test, and then two bound functions are designed to constrain Adam, and two bound functions tend to a constant value. To evaluate our method, we carry out experiments on the image classification task, three models including Smallnet, Network IN Network, and Resnet are trained on CIFAR10 and CIFAR100 datasets. Experimental results show that our method outperforms other optimizers on CIFAR10 and CIFAR100 datasets with accuracies of (82.76%, 53.29%), (86.24%, 60.19%), and (83.24%, 55.04%) on Smallnet, Network IN Network and Resnet, respectively. The results also indicate that our method maintains a faster learning speed, like adaptive gradient methods, in the early stage and achieves considerable accuracy, like SGD (M), at the end. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Simulation of Intellectual Property Management on Evolution Driving of Regional Economic Growth
Appl. Sci. 2022, 12(18), 9011; https://doi.org/10.3390/app12189011 - 08 Sep 2022
Viewed by 343
Abstract
The input, application, and transformation of intellectual property can significantly promote the economic development of a region, but the path and operation mechanism of intellectual property management on regional economic growth are not very clear. System dynamics theory was used to analyze the [...] Read more.
The input, application, and transformation of intellectual property can significantly promote the economic development of a region, but the path and operation mechanism of intellectual property management on regional economic growth are not very clear. System dynamics theory was used to analyze the driving force and resistance of intellectual property management from macro to micro. With the help of system dynamics theory, equations were constructed to simulate the process path and force from intellectual property management to regional economic growth, and sensitivity analysis was used to find sensitive influencing factors in the system. The following conclusions were drawn: (1) intellectual property affects regional economic growth from the macro level, such as intellectual property investment, policies, and the construction of rules and regulations; (2) enterprises, whether in the industry, universities and research institutes, or in this system, are the main body to create innovative benefits and ultimately promote regional economic growth; (3) the continuous investment of intellectual property resources and the driving force of enterprise innovation are all sensitive factors of this system. The government should give full play to its functions and strengthen the management of intellectual property in order to enable the regional economy to obtain high quality development. Through the study of the cooperation between the subjects in the process of intellectual property management activities, the integration and allocation process of factors and resources has been researched, and revealing the action process and dynamic change process of technological innovation activities on economic growth also have been revealed. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
An Unsupervised Depth-Estimation Model for Monocular Images Based on Perceptual Image Error Assessment
Appl. Sci. 2022, 12(17), 8829; https://doi.org/10.3390/app12178829 - 02 Sep 2022
Viewed by 293
Abstract
In this paper, we propose a novel unsupervised learning-based model for estimating the depth of monocular images by integrating a simple ResNet-based auto-encoder and some special loss functions. We use only stereo images obtained from binocular cameras as training data without using depth [...] Read more.
In this paper, we propose a novel unsupervised learning-based model for estimating the depth of monocular images by integrating a simple ResNet-based auto-encoder and some special loss functions. We use only stereo images obtained from binocular cameras as training data without using depth ground-truth data. Our model basically outputs a disparity map that is necessary to warp an input image to an image corresponding to a different viewpoint. When the input image is warped using the output-disparity map, distortions of various patterns inevitably occur in the reconstructed image. During the training process, the occurrence frequency and size of these distortions gradually decrease, while the similarity between the reconstructed and target images increases, which proves that the accuracy of the predicted disparity maps also increases. Therefore, one of the important factors in this type of training is an efficient loss function that accurately measures how much the difference in quality between the reconstructed and target images is and guides the gap to be properly and quickly closed as the training progresses. In recent related studies, the photometric difference was calculated through simple methods such as L1 and L2 loss or by combining one of these with a traditional computer vision-based hand-coded image-quality assessment algorithm such as SSIM. However, these methods have limitations in modeling various patterns at the level of the human visual system. Therefore, the proposed model uses a pre-trained perceptual image-quality assessment model that effectively mimics human-perception mechanisms to measure the quality of distorted images as image-reconstruction loss. In order to highlight the performance of the proposed loss functions, a simple ResNet50-based network is adopted in our model. We trained our model using stereo images of the KITTI 2015 driving dataset to measure the pixel-level depth for 768 × 384 images. Despite the simplicity of the network structure, thanks to the effectiveness of the proposed image-reconstruction loss, our model outperformed other state-of-the-art studies that have been trained in unsupervised methods on a variety of evaluation indicators. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Exploiting Hierarchical Label Information in an Attention-Embedding, Multi-Task, Multi-Grained, Network for Scene Classification of Remote Sensing Imagery
Appl. Sci. 2022, 12(17), 8705; https://doi.org/10.3390/app12178705 - 30 Aug 2022
Viewed by 407
Abstract
Remote sensing scene classification aims to automatically assign proper labels to remote sensing images. Most of the existing deep learning based methods usually consider the interclass and intraclass relationships of the image content for classification. However, these methods rarely consider the hierarchical information [...] Read more.
Remote sensing scene classification aims to automatically assign proper labels to remote sensing images. Most of the existing deep learning based methods usually consider the interclass and intraclass relationships of the image content for classification. However, these methods rarely consider the hierarchical information of scene labels, as a scene label may belong to hierarchically multi-grained levels. For example, multi-grained level labels may indicate that a remote sensing scene image may belong to the coarse-grained label “transportation land” while also belonging to the fine-grained label “airport”. In this paper, to exploit hierarchical label information, we propose an attention-embedding multi-task multi-grained network (AEMMN) for remote sensing scene classification. In the proposed AEMMN, we add a coarse-grained classifier as the first level and a fine-grained classifier as the second level to perform multi-task learning tasks. Additionally, a gradient control module is utilized to control the gradient propagation of two classifiers to suppress the negative transfer caused by the irrelevant features between tasks. In the feature extraction portion, the model uses an ECA module embedding Resnet50 to extract effective features with cross-channel interaction information. Furthermore, an external attention module is exploited to improve the discrimination of fine-grained and coarse-grained features. Experiments were conducted on the NWPU-RESISC45 and the Aerial Image Data Set (AID), and the overall accuracy of the proposed AEMMN is 92.07% on the NWPU-RESISC45 dataset and reached 94.96% on the AID. The results indicate that hierarchical label information can effectively improve the performance of scene classification tasks when categorizing remote sensing imagery. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Image Dehazing Algorithm Based on Deep Learning Coupled Local and Global Features
Appl. Sci. 2022, 12(17), 8552; https://doi.org/10.3390/app12178552 - 26 Aug 2022
Cited by 1 | Viewed by 369
Abstract
To address the problems that most convolutional neural network-based image defogging algorithm models capture incomplete global feature information and incomplete defogging, this paper proposes an end-to-end convolutional neural network and vision transformer hybrid image defogging algorithm. First, the shallow features of the haze [...] Read more.
To address the problems that most convolutional neural network-based image defogging algorithm models capture incomplete global feature information and incomplete defogging, this paper proposes an end-to-end convolutional neural network and vision transformer hybrid image defogging algorithm. First, the shallow features of the haze image were extracted by a preprocessing module. Then, a symmetric network structure including a convolutional neural network (CNN) branch and a vision transformer branch was used to capture the local features and global features of the haze image, respectively. The mixed features were fused using convolutional layers to cover the global representation while retaining the local features. Finally, the features obtained by the encoder and decoder were fused to obtain richer feature information. The experimental results show that the proposed defogging algorithm achieved better defogging results in both the uniform and non-uniform haze datasets, solves the problems of dark and distorted colors after image defogging, and the recovered images are more natural for detail processing. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Method for 2D-3D Registration under Inverse Depth and Structural Semantic Constraints for Digital Twin City
Appl. Sci. 2022, 12(17), 8543; https://doi.org/10.3390/app12178543 - 26 Aug 2022
Viewed by 358
Abstract
A digital twin city maps a virtual three-dimensional (3D) city model to the geographic information system, constructs a virtual world, and integrates real sensor data to achieve the purpose of virtual–real fusion. Focusing on the accuracy problem of vision sensor registration in the [...] Read more.
A digital twin city maps a virtual three-dimensional (3D) city model to the geographic information system, constructs a virtual world, and integrates real sensor data to achieve the purpose of virtual–real fusion. Focusing on the accuracy problem of vision sensor registration in the virtual digital twin city scene, this study proposes a 2D-3D registration method under inverse depth and structural semantic constraints. First, perspective and inverse depth images of the virtual scene were obtained by using perspective view and inverse-depth nascent technology, and then the structural semantic features were extracted by the two-line minimal solution set method. A simultaneous matching and pose estimation method under inverse depth and structural semantic constraints was proposed to achieve the 2D-3D registration of real images and virtual scenes. The experimental results show that the proposed method can effectively optimize the initial vision sensor pose and achieve high-precision registration in the digital twin scene, and the Z-coordinate error is reduced by 45%. An application experiment of monocular image multi-object spatial positioning was designed, which proved the practicability of this method, and the influence of model data error on registration accuracy was analyzed. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Graphical abstract

Article
Rwin-FPN++: Rwin Transformer with Feature Pyramid Network for Dense Scene Text Spotting
Appl. Sci. 2022, 12(17), 8488; https://doi.org/10.3390/app12178488 - 25 Aug 2022
Viewed by 370
Abstract
Scene text spotting has made tremendous progress with the in-depth research on deep convolutional neural networks (DCNN). Previous approaches mainly focus on the spotting of arbitrary-shaped scene text, on which it is difficult to achieve satisfactory results on dense scene text containing various [...] Read more.
Scene text spotting has made tremendous progress with the in-depth research on deep convolutional neural networks (DCNN). Previous approaches mainly focus on the spotting of arbitrary-shaped scene text, on which it is difficult to achieve satisfactory results on dense scene text containing various instances of bending, occlusion, and lighting. To address this problem, we propose an approach called Rwin-FPN++, which incorporates the long-range dependency merit of the Rwin Transformer into the feature pyramid network (FPN) to effectively enhance the functionality and generalization of FPN. Specifically, we first propose the rotated windows-based Transformer (Rwin) to enhance the rotation-invariant performance of self-attention. Then, we attach the Rwin Transformer to each level on our feature pyramids to extract global self-attention contexts for each feature map produced by the FPN. Thirdly, we fuse these feature pyramids by upsampling to predict the score matrix and keypoints matrix of the text regions. Fourthly, a simple post-processing process is adopted to precisely merge the pixels in the score matrix and keypoints matrix and obtain the final segmentation results. Finally, we use the recurrent neural network to recognize each segmentation region and thus achieve the final spotting results. To evaluate the performance of our Rwin-FPN++ network, we construct a dense scene text dataset with various shapes and occlusion from the wiring of the terminal block of the substation panel cabinet. We train our Rwin-FPN++ network on public datasets and then evaluate the performance on our dense scene text dataset. Experiments demonstrate that our Rwin-FPN++ network can achieve an F-measure of 79% and outperform all other methods in F-measure by at least 2.8%. This is because our proposed method has better rotation invariance and long-range dependency merit. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Research on Rockburst Risk Level Prediction Method Based on LightGBM−TCN−RF
Appl. Sci. 2022, 12(16), 8226; https://doi.org/10.3390/app12168226 - 17 Aug 2022
Viewed by 356
Abstract
Rockburst hazards pose a severe threat to mine safety. To accurately predict the risk level of rockburst, a LightGBM−TCN−RF prediction model is proposed in this paper. The correlation coefficient heat map combined with the LightGBM feature selection algorithm is used to screen the [...] Read more.
Rockburst hazards pose a severe threat to mine safety. To accurately predict the risk level of rockburst, a LightGBM−TCN−RF prediction model is proposed in this paper. The correlation coefficient heat map combined with the LightGBM feature selection algorithm is used to screen the rockburst characteristic variables and establish rockburst predicted characteristic variables. Then, the TCN prediction model with a better prediction performance is selected to predict the rockburst characteristic variables at time t + 1. The RF classification model of rockburst risk level with a better classification effect is used to classify the risk level of rockburst characteristic variables at time t + 1. The comparison experiments show that the rockburst characteristic variables after screening allow a more accurate prediction. The overall RMSE and MAE of the TCN prediction model are 0.124 and 0.079, which are better than those of RNN, LSTM, and GRU by about 0.1–2.5%. The accuracy of the RF classification model for the rockburst risk level is 96.17%, which is about 20% higher than that of KNN and SVM, and the model accuracy is improved by 1.62% after parameter tuning by the PSO algorithm. The experimental results show that the LightGBM−TCN−RF model can better classify and predict rockburst risk levels at future moments, which has a certain reference value for rockburst monitoring and early warning. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Intelligent Target Design Based on Complex Target Simulation
Appl. Sci. 2022, 12(16), 8010; https://doi.org/10.3390/app12168010 - 10 Aug 2022
Viewed by 388
Abstract
The emergence and popularization of various fifth-generation fighter jets with supersonic cruise, super maneuverability, and stealth functionalities have raised higher and more comprehensive challenges for the tactical performance and operational indicators of air defense weapon systems. The training of air defense systems requires [...] Read more.
The emergence and popularization of various fifth-generation fighter jets with supersonic cruise, super maneuverability, and stealth functionalities have raised higher and more comprehensive challenges for the tactical performance and operational indicators of air defense weapon systems. The training of air defense systems requires simulated targets; however, the traditional targets cannot simulate the radar cross-section (RCS) distribution characteristics of fifth-generation fighter aircrafts. In addition, the existing target aircrafts are expensive and cannot be mass-produced. Therefore, in this paper, a corner reflector and a Luneburg ball reflector with RCS distribution characteristics of a fifth-generation fighter in a certain spatial area are designed for target simulation. Several corner reflectors and Luneburg balls are used to form an array to realize the simulations. The RCS value and distribution characteristics of the target can be combined with fuzzy clustering and a single-chip microcomputer to design an intelligent switching system, which improves the practicability of the intelligent target design proposed in this paper. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Numerical Analysis of Instability Mechanism of a High Slope under Excavation Unloading and Rainfall
Appl. Sci. 2022, 12(16), 7990; https://doi.org/10.3390/app12167990 - 10 Aug 2022
Cited by 1 | Viewed by 436
Abstract
High slope simulation analysis is an essential means of slope engineering design, construction, and operation management. It is necessary to master slope dynamics, ensure slope safety, analyze slope instability mechanisms, and carry out slope stability early warning and prediction. This paper, aiming at [...] Read more.
High slope simulation analysis is an essential means of slope engineering design, construction, and operation management. It is necessary to master slope dynamics, ensure slope safety, analyze slope instability mechanisms, and carry out slope stability early warning and prediction. This paper, aiming at the landslide phenomenon of the high slope on the left bank of a reservoir project, considering the influence of stratum lithology, fault, excavation unloading, rainfall, and water storage, establishes a refined finite element model that reflects the internal structure of the slope. The fluid-solid coupling numerical simulation analysis of the high slope is carried out. Based on this, the failure mechanism of the slope under excavation unloading and heavy rainfall is explained. The application of an engineering example shows that under the combined action of excavation unloading and rainfall infiltration, the in-plane saturation of the structure formed at fault at the trailing edge of the excavation slope surface increases, the pore water pressure increases, and the shear strain concentration area appears at the internal structural surface of the slope. The shear strain concentration area extends along the structural surface to the front and rear edges of the slope, resulting in landslide damage. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
STDecoder-CD: How to Decode the Hierarchical Transformer in Change Detection Tasks
Appl. Sci. 2022, 12(15), 7903; https://doi.org/10.3390/app12157903 - 06 Aug 2022
Cited by 1 | Viewed by 425
Abstract
Change detection (CD) is in demand in satellite imagery processing. Inspired by the recent success of the combined transformer-CNN (convolutional neural network) model, TransCNN, originally designed for image recognition, in this paper, we present STDecoder-CD for change detection applications, which is a combination [...] Read more.
Change detection (CD) is in demand in satellite imagery processing. Inspired by the recent success of the combined transformer-CNN (convolutional neural network) model, TransCNN, originally designed for image recognition, in this paper, we present STDecoder-CD for change detection applications, which is a combination of the Siamese network (“S”), the TransCNN backbone (“T”), and three types of decoders (“Decoder”). The Type I model uses a UNet-like decoder, and the Type II decoder is defined by a combination of three modules: the difference detector, FPN (feature pyramid network), and FCN (fully convolutional network). The Type III model updates the change feature map by introducing a transformer decoder. The effectiveness and advantages of the proposed methods over the state-of-the-art alternatives were demonstrated on several CD datasets, and experimental results indicate that: (1) STDecoder-CD has excellent generalization ability and has strong robustness to pseudo-changes and noise. (2) An end-to-end CD network architecture cannot be completely free from the influence of the decoding strategy. In our case, the Type I decoder often obtained finer details than Types II and III due to its multi-scale design. (3) Using the ablation or replacing strategy to modify the three proposed decoder architectures had a limited impact on the CD performance of STDecoder-CD. To the best of our knowledge, we are the first to investigate the effect of different decoding strategies on CD tasks. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Calibrated Convolution with Gaussian of Difference
Appl. Sci. 2022, 12(13), 6570; https://doi.org/10.3390/app12136570 - 29 Jun 2022
Viewed by 382
Abstract
Attention mechanisms are widely used for Convolutional Neural Networks (CNNs) when performing various visual tasks. Many methods introduce multi-scale information into attention mechanisms to improve their feature transformation performance; however, these methods do not take into account the potential importance of scale invariance. [...] Read more.
Attention mechanisms are widely used for Convolutional Neural Networks (CNNs) when performing various visual tasks. Many methods introduce multi-scale information into attention mechanisms to improve their feature transformation performance; however, these methods do not take into account the potential importance of scale invariance. This paper proposes a novel type of convolution, called Calibrated Convolution with Gaussian of Difference (CCGD), that takes into account both the attention mechanisms and scale invariance. A simple yet effective scale-invariant attention module that operates within a single convolution is able to adaptively build powerful scale-invariant features to recalibrate the feature representation. Along with this, a CNN with a heterogeneously grouped structure is used, which enhances the multi-scale representation capability. CCGD can be flexibly deployed in modern CNN architectures without introducing extra parameters. During experimental tests on various datasets, the method increased the ResNet50-based classification accuracy from 76.40% to 77.87% on the ImageNet dataset, and the tests generally confirmed that CCGD can outperform other state-of-the-art attention methods. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Article
Single Image Super-Resolution Method Based on an Improved Adversarial Generation Network
Appl. Sci. 2022, 12(12), 6067; https://doi.org/10.3390/app12126067 - 15 Jun 2022
Cited by 1 | Viewed by 402
Abstract
Super-Resolution (SR) techniques for image restoration have recently been gaining attention due to their excellent performance. For powerful learning abilities, Generative Adversarial Networks (GANs) have been proven to have achieved great success. In this paper, we propose an Enhanced Generative Adversarial Network (EGAN) [...] Read more.
Super-Resolution (SR) techniques for image restoration have recently been gaining attention due to their excellent performance. For powerful learning abilities, Generative Adversarial Networks (GANs) have been proven to have achieved great success. In this paper, we propose an Enhanced Generative Adversarial Network (EGAN) for improving its effects for a real-time Super-Resolution task. The main content of this paper are as follows: (1) We adopted the Laplacian pyramid framework as a pre-trained module, which is beneficial for providing multiscale features for our input. (2) At each feature block, a convolutional skip-connections network, which may contain some latent information, was significant for the generative model to reconstruct a plausible-looking image. (3) Considering that the edge details usually play an important role in image generation, a perceptual loss function was defined to train and seek the optimal parameters. Quantitative and qualitative evaluations were demonstrated so that our algorithm not only took full advantage of the Convolutional Neural Networks (CNNs) to improve the image quality, but also performed better than other algorithms in speed and performance for real-time Super-Resolution tasks. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Review

Jump to: Research

Review
Deep Residual Learning for Image Recognition: A Survey
Appl. Sci. 2022, 12(18), 8972; https://doi.org/10.3390/app12188972 - 07 Sep 2022
Cited by 3 | Viewed by 1776
Abstract
Deep Residual Networks have recently been shown to significantly improve the performance of neural networks trained on ImageNet, with results beating all previous methods on this dataset by large margins in the image classification task. However, the meaning of these impressive numbers and [...] Read more.
Deep Residual Networks have recently been shown to significantly improve the performance of neural networks trained on ImageNet, with results beating all previous methods on this dataset by large margins in the image classification task. However, the meaning of these impressive numbers and their implications for future research are not fully understood yet. In this survey, we will try to explain what Deep Residual Networks are, how they achieve their excellent results, and why their successful implementation in practice represents a significant advance over existing techniques. We also discuss some open questions related to residual learning as well as possible applications of Deep Residual Networks beyond ImageNet. Finally, we discuss some issues that still need to be resolved before deep residual learning can be applied on more complex problems. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: APPLYING TERNION-STREAM DCNN IN REAL-TIME FOR VEHI-CLE RE-IDENTIFICATION AND TRACKING ACROSS MULTIPLE NON-OVERLAPPING CAMERAS
Author:
Highlights: Ternion Stream DCNN improves the vehicle features extraction and mapping, hence also improves the detection quality rate and tracking, and re-identifications across the non-overlapping multiple cameras in the urban area.

Back to TopTop