applsci-logo

Journal Browser

Journal Browser

AI-Driven Computer Vision and Pattern Recognition: Challenges and Applications

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 November 2025) | Viewed by 13233

Special Issue Editors


E-Mail Website
Guest Editor
1. Department of Computer Science, Cornell University, Ithaca, NY 14853, USA
2. Department of Hydraulics and Sanitation, Technology Sector, Federal University of Paraná, Curitiba 81531-990, Brazil
Interests: computer vision; deep learning; Internet of Things (IoT); AI; big data; pattern recognition
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
National Research Council, Institute of Research on Population and Social Policies (CNR-IRPPS), 00185 Rome, Italy
Interests: social computing; artificial intelligence; human-computer interaction; multimodal and natural language processing; user-centered interaction design; knowledge management and social innovation
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In recent years, artificial intelligence has been revolutionizing the fields of computer vision and pattern recognition, driving significant advancements in various applications, including medical image analysis, autonomous systems, intelligent surveillance, and advanced robotics. However, integrating AI into these domains still presents numerous challenges, such as decision interpretability, model robustness, the need for high-quality labeled data, as well as ethical and security concerns.

This Special Issue is dedicated to exploring advanced solutions for AI-driven computer vision and pattern recognition, highlighting pioneering research, innovative methodologies and applications, as well as groundbreaking technologies that are advancing applications in these fields.

We particularly welcome contributions that advance the frontiers of these fields through novel algorithms, deep learning architectures, optimization strategies, practical applications, and interdisciplinary approaches. Submissions may address,  but not limited to, the following topics:

  • Advancements in computer vision and pattern recognition algorithms;
  • Robustness and interpretability of AI models;
  • AI for image and video processing in complex environments;
  • Image generation and synthesis through deep generative models;
  • Self-supervised learning and few-shot learning techniques;
  • Applications of AI-driven computer vision and pattern recognition in critical domains, such as healthcare, security, Industry 4.0, autonomous systems;
  • Ethical considerations and bias in AI-driven systems.

Prof. Dr. Heinz Dieter Fill
Dr. Arianna D'Ulizia
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • pattern recognition algorithms
  • smart environments
  • intelligent surveillance
  • medical image analysis
  • autonomous systems
  • advanced robotics
  • decision interpretability
  • model robustness

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 37629 KB  
Article
FacadeGAN: Facade Texture Placement with GANs
by Elif Şanlıalp and Muhammed Abdullah Bulbul
Appl. Sci. 2026, 16(2), 860; https://doi.org/10.3390/app16020860 - 14 Jan 2026
Abstract
This study presents a texture-aware image synthesis framework designed to generate material-consistent façades using adversarial learning. The proposed architecture incorporates a mask-guided channel-wise attention mechanism that adaptively merges segmentation information with texture statistics to reconcile structural guiding with textural fidelity. A thorough comparative [...] Read more.
This study presents a texture-aware image synthesis framework designed to generate material-consistent façades using adversarial learning. The proposed architecture incorporates a mask-guided channel-wise attention mechanism that adaptively merges segmentation information with texture statistics to reconcile structural guiding with textural fidelity. A thorough comparative analysis was performed utilizing three internal variants—Vanilla GAN, Wasserstein GAN (WGAN), and WGAN-GP—against leading baselines, including TextureGAN and Pix2Pix. The assessment utilized a comprehensive multi-metric framework that included SSIM, FID, KID, LPIPS, and DISTS, in conjunction with a VGG-19 based perceptual loss. Experimental results indicate a notable divergence between pixel-wise accuracy and perceptual realism; although established baselines attained elevated PSNR values, the suggested Vanilla GAN and WGAN models exhibited enhanced perceptual fidelity, achieving the lowest LPIPS and DISTS scores. The WGAN-GP model, although theoretically stable, produced smoother but less complex textures due to the regularization enforced by the gradient penalty term. Ablation investigations further validated that the attention mechanism consistently enhanced structural alignment and texture sharpness across all topologies. Thus, the study suggests that Vanilla GAN and WGAN architectures, enhanced by attention-based fusion, offer an optimal balance between realism and structural fidelity for high-frequency texture creation applications. Full article
Show Figures

Figure 1

28 pages, 731 KB  
Article
Research on an Automatic Classification Method for Art Film Scenes Based on Image and Audio Deep Features
by Zhaojun An and Heinz D. Fill
Appl. Sci. 2025, 15(23), 12603; https://doi.org/10.3390/app152312603 - 28 Nov 2025
Viewed by 518
Abstract
This paper addresses the challenging task of automatic scene classification in art films, a genre characterized by symbolic visuals, asynchronous audio, and non-linear storytelling. We propose Styloformer, a multimodal transformer architecture designed to integrate visual, auditory, textual, and curatorial signals into a unified [...] Read more.
This paper addresses the challenging task of automatic scene classification in art films, a genre characterized by symbolic visuals, asynchronous audio, and non-linear storytelling. We propose Styloformer, a multimodal transformer architecture designed to integrate visual, auditory, textual, and curatorial signals into a unified representation space. The model combines cross-modal attention, stylistic clustering, influence prediction, and canonicality estimation to handle the semantic and historical complexity of art cinema. Additionally, we introduce a novel module called Historiographic Navigation, which embeds ontological priors and temporal logic to support interpretive reasoning. Evaluated on multiple benchmarks, Styloformer achieves state-of-the-art performance, including 91.85% accuracy and 94.31% AUC on the MovieNet dataset—outperforming baselines such as CLIP and ViT. Ablation studies further demonstrate the importance of each architectural component. Unlike general-purpose video models, our system is tailored to the aesthetic and narrative structure of art films, making it suitable for applications in digital curation and computational film analysis. Styloformer represents a scalable and interpretable approach to understanding artistic media, bridging machine learning with art historical reasoning. Full article
Show Figures

Figure 1

25 pages, 12345 KB  
Article
SmaAt-UNet Optimized by Particle Swarm Optimization (PSO): A Study on the Identification of Detachment Diseases in Ming Dynasty Temple Mural Paintings in North China
by Chuanwen Luo, Zikun Shang, Yan Zhang, Hao Pan, Abdusalam Nuermaimaiti, Chenlong Wang, Ning Li and Bo Zhang
Appl. Sci. 2025, 15(22), 12295; https://doi.org/10.3390/app152212295 - 19 Nov 2025
Viewed by 483
Abstract
The temple mural paintings of the Ming Dynasty in China are highly valuable cultural heritage. However, murals in North China have long faced deterioration such as pigment-layer detachment, which seriously threatens their preservation and study, gradually leading to cultural incompleteness and impeding protection [...] Read more.
The temple mural paintings of the Ming Dynasty in China are highly valuable cultural heritage. However, murals in North China have long faced deterioration such as pigment-layer detachment, which seriously threatens their preservation and study, gradually leading to cultural incompleteness and impeding protection decisions. This study proposes a coherent deep-learning technical paradigm, constructs a mural dataset, compares the performance of multiple models, and optimizes the selected model to enable automatic identification of mural detachment. The study applies five segmentation models—UNet, U2-NetP, SegNet, NestedUNet, and SmaAt-UNet—to perform a systematic comparison under the same conditions on 37,685 image slices, and evaluates their performance using four metrics: IoU, Dice, MAE, and mPA. Owing to its lightweight structure and attention-enhanced feature-extraction module, SmaAt-UNet effectively preserves mural edge details and performs best at identifying pigment-layer detachment. After introducing Particle Swarm Optimization (PSO), the IoU of the SmaAt-UNet model on the dataset increased to 73.25%, the Dice increased to 79.36%, the mPA increased to 97.02%, and the MAE decreased from 0.0592 to 0.0455, corresponding to an absolute reduction of 0.0137, and the model’s generalization ability and edge-recognition accuracy were significantly enhanced. This study constructs a systematic identification framework for pigment layer detachment in Ming Dynasty (1368–1644 AD) temple murals, closely combining deep learning technology with cultural heritage protection. It not only realizes the automatic identification of disease areas but also provides technical support for preventive protection and the construction of digital archives. Full article
Show Figures

Figure 1

18 pages, 3102 KB  
Article
MFFN-FCSA: Multi-Modal Feature Fusion Networks with Fully Connected Self-Attention for Radar Space Target Recognition
by Leiyao Liao, Yunda Jiang, Gengxin Zhang and Ziwei Liu
Appl. Sci. 2025, 15(22), 11940; https://doi.org/10.3390/app152211940 - 10 Nov 2025
Viewed by 649
Abstract
Radar space target recognition is faced with inherent challenges due to complex electromagnetic scattering properties and limited training samples. Conventional single-modality approaches cannot fully characterize targets due to information incompleteness, and existing multi-modal fusion methods often neglect deep exploration of cross-modal feature correlations. [...] Read more.
Radar space target recognition is faced with inherent challenges due to complex electromagnetic scattering properties and limited training samples. Conventional single-modality approaches cannot fully characterize targets due to information incompleteness, and existing multi-modal fusion methods often neglect deep exploration of cross-modal feature correlations. To address this issue, this paper presents a novel multi-modal feature fusion network with fully connected self-attention (MFFN-FCSA) for robust radar space target recognition. The proposed framework innovatively integrates multi-modal radar data, including high-resolution range profiles (HRRPs) and inverse synthetic aperture radar (ISAR) images, to exploit the complementary characteristics comprehensively. Our MFFN-FCSA consists of three modules: the parallel convolutional branches for modality-specific feature extraction of HRRPs and ISAR images, an FCSA-based fusion module for cross-modal feature fusion, and a classification head. Specially, the designed FCSA fusion module simultaneously learns spatial and channel-wise dependencies via a fully connected self-attention mechanism, which enables learning dynamic weights of discriminative features across modalities. Furthermore, our end-to-end MFFN-FCSA model incorporates a composite loss function that combines a focal cross-entropy loss to address class imbalance and a triplet margin loss for enhanced metric learning. Experimental results based on a space target dataset with 10 categories show the high recognition accuracy of our model compared to related single-modality and existing fusion approaches, particularly showing promising generalization capabilities on few-shot and polarization variation scenarios. Full article
Show Figures

Figure 1

17 pages, 1049 KB  
Article
Learning Part-Based Features for Vehicle Re-Identification with Global Context
by Rajsekhar Kumar Nath and Debjani Mitra
Appl. Sci. 2025, 15(13), 7041; https://doi.org/10.3390/app15137041 - 23 Jun 2025
Viewed by 1808
Abstract
Re-identification in automated surveillance systems is a challenging deep learning problem. Learning part-based features augmented with one or more global features is an efficient approach for enhancing the performance of re-identification networks. However, the latter may increase the number of trainable parameters, leading [...] Read more.
Re-identification in automated surveillance systems is a challenging deep learning problem. Learning part-based features augmented with one or more global features is an efficient approach for enhancing the performance of re-identification networks. However, the latter may increase the number of trainable parameters, leading to unacceptable complexity. We propose a novel part-based model that unifies a global component by taking the distances of the parts from the global feature vector and using them as loss weights during the training of the individual parts, without increasing complexity. We conduct extensive experiments on two large-scale standard vehicle re-identification datasets to test, validate, and perform a comparative performance analysis of the proposed approach, which we named the global–local similarity-induced part-based network (GLSIPNet). The results show that our method outperforms the baseline by 2.5% (mAP) in the case of the VeRi dataset and by 2.4%, 3.3%, and 2.8% (mAP) for small, medium, and large variants of the VehicleId dataset, respectively. It also performs on par with state-of-the-art methods in the literature used for comparison. Full article
Show Figures

Figure 1

36 pages, 6781 KB  
Article
A Comparative Study of Azure Custom Vision Versus Google Vision API Integrated into AI Custom Models Using Object Classification for Residential Waste
by Cosmina-Mihaela Rosca, Adrian Stancu and Marius Radu Tănase
Appl. Sci. 2025, 15(7), 3869; https://doi.org/10.3390/app15073869 - 1 Apr 2025
Cited by 13 | Viewed by 2436
Abstract
The residential separate collection of waste is the first stage in waste recyclability for sustainable development. The paper focuses on designing and implementing a low-cost residential automatic waste sorting bin (RBin) for recycling, alleviating the user’s classification burden. Next, an analysis of two [...] Read more.
The residential separate collection of waste is the first stage in waste recyclability for sustainable development. The paper focuses on designing and implementing a low-cost residential automatic waste sorting bin (RBin) for recycling, alleviating the user’s classification burden. Next, an analysis of two object identification and classification models was conducted to sort materials into the categories of cardboard, glass, plastic, and metal. A major challenge in sorting classification is distinguishing between glass and plastic due to their similar visual characteristics. The research assesses the performance of the Azure Custom Vision Service (ACVS) model, which achieves high accuracy on training data but underperforms in real-time applications, with an accuracy of 95.13%. In contrast, the second model, the Custom Waste Sorting Model (CWSM), demonstrates high accuracy (96.25%) during training and proves to be effective in real-time applications. The CWSM uses a two-tier approach, first identifying the object descriptively using the Google Vision API Service (GVAS) model, followed by classification through the CWSM, a predicate-based custom model. The CWSM employs the LbfgsMaximumEntropyMulti algorithm and a dataset of 1000 records for training, divided equally across the categories. This study proposes an innovative evaluation metric, the Weighted Classification Confidence Score (WCCS). The results show that the CWSM outperforms ACVS in real-world testing, achieving a real accuracy of 99.75% after applying the WCCS. The paper explores the importance of customized models over pre-implemented services when the model uses characteristics and not pixel-by-pixel examination. Full article
Show Figures

Figure 1

20 pages, 2518 KB  
Article
Designing and Implementing a Public Urban Transport Scheduling System Based on Artificial Intelligence for Smart Cities
by Cosmina-Mihaela Rosca, Adrian Stancu, Cosmin-Florinel Neculaiu and Ionuț-Adrian Gortoescu
Appl. Sci. 2024, 14(19), 8861; https://doi.org/10.3390/app14198861 - 2 Oct 2024
Cited by 14 | Viewed by 5706
Abstract
Many countries encourage their populations to use public urban transport to decrease pollution and traffic congestion. However, this can generate overcrowded routes at certain times and low economic efficiency for public urban transport companies when buses carry few passengers. This article proposes a [...] Read more.
Many countries encourage their populations to use public urban transport to decrease pollution and traffic congestion. However, this can generate overcrowded routes at certain times and low economic efficiency for public urban transport companies when buses carry few passengers. This article proposes a Public Urban Transport Scheduling System (PUTSS) algorithm for allocating a public urban transport fleet based on the number of passengers waiting for a bus and considering the efficiency of public urban transport companies. The PUTSS algorithm integrates artificial intelligence (AI) methods to identify the number of people waiting at each station through real-time image acquisition. The technique presented is Azure Computer Vision. In a case study, the accuracy of correctly identifying the number of persons in an image was computed using the Microsoft Azure Computer Vision service. The proposed PUTSS algorithm also uses Google Maps Service for congestion-level identification. Employing these modern tools in the algorithm makes improving public urban transport services possible. The algorithm is integrated into a software application developed in C#, simulating a real-world scenario involving two public urban transport vehicles. The global accuracy rate of 89.81% demonstrates the practical applicability of the software product. Full article
Show Figures

Figure 1

Back to TopTop