Submit to BDCC Review for BDCC Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Perception and Detection of Intelligent Vision

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Big Data and Cognitive Computing (ISSN 2504-2289).

Deadline for manuscript submissions: closed (30 April 2025) | Viewed by 26709

Share This Special Issue

Special Issue Editors

Prof. Dr. Hongshan Yu

E-Mail Website
Guest Editor

School of Robotics, Hunan University, Changsha 410082, China.
Interests: robotic perception; machine learning; pattern recognition

Dr. Zhengeng Yang

E-Mail Website
Guest Editor

School of Engineering and Design, Hunan Normal University, Changsha 410081, China
Interests: computer vision; deep learning; few-shot learning; representation learning

Dr. Mingtao Feng

E-Mail Website
Guest Editor

The School of Artificial Intelligence, Xidian University, Xi'an 710126, China
Interests: computer vision; 3D vision; scene understanding
Special Issues, Collections and Topics in MDPI journals

Dr. Qieshi Zhang

E-Mail Website
Guest Editor

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Interests: computer vision; autonomous driving; intelligent robots; human–computer interaction; action recognition
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Vision plays the most important role in the cognitive system of humans. Constructing an intelligent vision system that can achieve part or even all of the capabilities of the human vision system is one of the ultimate goals of many research areas such as computer vison, artificial intelligence, and cognitive science. In recent years, based on the significant amount of internet data, data-driven methods have greatly advanced the frontiers of intelligent vision, although it is not yet as powerful as the human vision system.

The aim of this Special Issue is to explore recent advances in visual perception in relation to the fields of computer vision and cognitive science. This Special Issue will bring together leading researchers and developers to present their latest research on algorithm design, system frameworks, and cognitive theories for developing intelligent vision systems. In this Special Issue, original research articles and reviews are welcome. Research areas may include (but not are limited to) the following:

Computer vision;
Robot vision;
Visual perception;
Scene understanding;
3D vision;
Deep learning;
Visual representation learning;
Intelligent vision device;
Large vision model;
Unsupervised learning;
Multi-model learning;
Visual cognition theories;
Action recognition and understanding;
Human-computer interaction.

We look forward to receiving your contributions.

Prof. Dr. Hongshan Yu
Dr. Zhengeng Yang
Dr. Mingtao Feng
Dr. Qieshi Zhang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

visual perception
deep learning
intelligent vision systems
vision cognition
unsupervised learning
large models

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

23 pages, 811 KB

Open AccessArticle

Efficient Dynamic Emotion Recognition from Facial Expressions Using Statistical Spatio-Temporal Geometric Features

by Yacine Yaddaden

Big Data Cogn. Comput. 2025, 9(8), 213; https://doi.org/10.3390/bdcc9080213 - 19 Aug 2025

Cited by 2 | Viewed by 2591

Abstract

Automatic Facial Expression Recognition (AFER) is a key component of affective computing, enabling machines to recognize and interpret human emotions across various applications such as human–computer interaction, healthcare, entertainment, and social robotics. Dynamic AFER systems, which exploit image sequences, can capture the temporal evolution of facial expressions but often suffer from high computational costs, limiting their suitability for real-time use. In this paper, we propose an efficient dynamic AFER approach based on a novel spatio-temporal representation. Facial landmarks are extracted, and all possible Euclidean distances are computed to model the spatial structure. To capture temporal variations, three statistical metrics are applied to each distance sequence. A feature selection stage based on the Extremely Randomized Trees (ExtRa-Trees) algorithm is then performed to reduce dimensionality and enhance classification performance. Finally, the emotions are classified using a linear multi-class Support Vector Machine (SVM) and compared against the k-Nearest Neighbors (k-NN) method. The proposed approach is evaluated on three benchmark datasets: CK+, MUG, and MMI, achieving recognition rates of 94.65%, 93.98%, and 75.59%, respectively. Our results demonstrate that the proposed method achieves a strong balance between accuracy and computational efficiency, making it well-suited for real-time facial expression recognition applications. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

16 pages, 27206 KB

Open AccessArticle

RecurrentOcc: An Efficient Real-Time Occupancy Prediction Model with Memory Mechanism

by Zimo Chen, Yuxiang Xie and Yingmei Wei

Big Data Cogn. Comput. 2025, 9(7), 176; https://doi.org/10.3390/bdcc9070176 - 2 Jul 2025

Viewed by 3023

Abstract

Three-dimensional Occupancy Prediction provides a detailed representation of the surrounding environment, essential for autonomous driving. Long temporal image sequence fusion is a common technique used to improve the occupancy prediction performance. However, existing temporal fusion methods are inefficient due to three issues: repetitive feature extraction from temporal images, redundant fusion of temporal features, and suboptimal fusion of long-term historical features. To address these challenges, we propose the Recurrent Occupancy Prediction Network (RecurrentOcc). We introduce the Scene Memory Gate, a new temporal fusion module that condenses temporal scene features into a single historical feature map. This eliminates the need for repeated extraction and aggregation of multiple temporal images, reducing computational overhead. The Scene Memory Gate selectively retains valuable information from historical features and recurrently updates the historical feature map, enhancing temporal fusion performance. Additionally, we design a simple yet efficient encoder, significantly reducing the number of model parameters. Compared with other real-time methods, RecurrentOcc achieves state-of-the-art performance of 39.9 mIoU on the Occ3D-NuScenes dataset with the fewest parameters of 59.1 M and an inference speed of 23.4 FPS. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

25 pages, 13698 KB

Open AccessEditor’s ChoiceArticle

Self-Supervised Foundation Model for Template Matching

by Anton Hristov, Dimo Dimov and Maria Nisheva-Pavlova

Big Data Cogn. Comput. 2025, 9(2), 38; https://doi.org/10.3390/bdcc9020038 - 11 Feb 2025

Cited by 5 | Viewed by 3698

Abstract

Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations in the textures, different modalities, and weak visual features exist in the images, leading to limited applications on real-world tasks. We introduce Self-Supervised Foundation Model for Template Matching (Self-TM), a novel end-to-end approach to self-supervised learning template matching. The idea behind Self-TM is to learn hierarchical features incorporating localization properties from images without any annotations. As going deeper in the convolutional neural network (CNN) layers, their filters begin to react to more complex structures and their receptive fields increase. This leads to loss of localization information in contrast to the early layers. The hierarchical propagation of the last layers back to the first layer results in precise template localization. Due to its zero-shot generalization capabilities on tasks such as image retrieval, dense template matching, and sparse image matching, our pre-trained model can be classified as a foundation one. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

18 pages, 29962 KB

Open AccessArticle

Eliciting Emotions: Investigating the Use of Generative AI and Facial Muscle Activation in Children’s Emotional Recognition

by Manuel A. Solis-Arrazola, Raul E. Sanchez-Yanez, Ana M. S. Gonzalez-Acosta, Carlos H. Garcia-Capulin and Horacio Rostro-Gonzalez

Big Data Cogn. Comput. 2025, 9(1), 15; https://doi.org/10.3390/bdcc9010015 - 20 Jan 2025

Cited by 5 | Viewed by 4584

Abstract

This study explores children’s emotions through a novel approach of Generative Artificial Intelligence (GenAI) and Facial Muscle Activation (FMA). It examines GenAI’s effectiveness in creating facial images that produce genuine emotional responses in children, alongside FMA’s analysis of muscular activation during these expressions. The aim is to determine if AI can realistically generate and recognize emotions similar to human experiences. The study involves generating a database of 280 images (40 per emotion) of children expressing various emotions. For real children’s faces from public databases (DEFSS and NIMH-CHEFS), five emotions were considered: happiness, angry, fear, sadness, and neutral. In contrast, for AI-generated images, seven emotions were analyzed, including the previous five plus surprise and disgust. A feature vector is extracted from these images, indicating lengths between reference points on the face that contract or expand based on the expressed emotion. This vector is then input into an artificial neural network for emotion recognition and classification, achieving accuracies of up to 99% in certain cases. This approach offers new avenues for training and validating AI algorithms, enabling models to be trained with artificial and real-world data interchangeably. The integration of both datasets during training and validation phases enhances model performance and adaptability. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

18 pages, 5855 KB

Open AccessArticle

Suspension Parameter Estimation Method for Heavy-Duty Freight Trains Based on Deep Learning

by Changfan Zhang, Yuxuan Wang and Jing He

Big Data Cogn. Comput. 2024, 8(12), 181; https://doi.org/10.3390/bdcc8120181 - 4 Dec 2024

Cited by 1 | Viewed by 1819

Abstract

The suspension parameters of heavy-duty freight trains can deviate from their initial design values due to material aging and performance degradation. While traditional multibody dynamics simulation models are usually designed for fixed working conditions, it is difficult for them to adequately analyze the safety status of the vehicle–line system in actual operation. To address this issue, this research provides a suspension parameter estimation technique based on CNN-GRU. Firstly, a prototype C80 train was utilized to build a simulation model for multibody dynamics. Secondly, six key suspension parameters for wheel–rail force were selected using the Sobol global sensitivity analysis method. Then, a CNN-GRU proxy model was constructed, with the actually measured wheel–rail forces as a reference. By combining this approach with NSGA-II (Non-dominated Sorting Genetic Algorithm II), the key suspension parameters were calculated. Finally, the estimated parameter values were applied into the vehicle–line coupled multibody dynamical model and validated. The results show that, with the corrected dynamical model, the relative errors of the simulated wheel–rail force are reduced from 9.28%, 6.24% and 18.11% to 7%, 4.52% and 10.44%, corresponding to straight, curve, and long and steep uphill conditions, respectively. The wheel–rail force simulation’s precision is increased, indicating that the proposed method is effective in estimating the suspension parameters for heavy-duty freight trains. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

14 pages, 7338 KB

Open AccessArticle

Strawberry Ripeness Detection Using Deep Learning Models

by Zhiyuan Mi and Wei Qi Yan

Big Data Cogn. Comput. 2024, 8(8), 92; https://doi.org/10.3390/bdcc8080092 - 15 Aug 2024

Cited by 14 | Viewed by 5710

Abstract

In agriculture, the timely and accurate assessment of fruit ripeness is crucial to optimizing harvest planning and reduce waste. In this article, we explore the integration of two cutting-edge deep learning models, YOLOv9 and Swin Transformer, to develop a complex model for detecting strawberry ripeness. Trained and tested on a specially curated dataset, our model achieves a mean precision (mAP) of 87.3% by using the metric intersection over union (IoU) at a threshold of 0.5. This outperforms the model using YOLOv9 alone, which achieves an mAP of 86.1%. Our model also demonstrated improved precision and recall, with precision rising to 85.3% and recall rising to 84.0%, reflecting its ability to accurately and consistently detect different stages of strawberry ripeness. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

24 pages, 1853 KB

Open AccessArticle

Optimal Image Characterization for In-Bed Posture Classification by Using SVM Algorithm

by Claudia Angelica Rivera-Romero, Jorge Ulises Munoz-Minjares, Carlos Lastre-Dominguez and Misael Lopez-Ramirez

Big Data Cogn. Comput. 2024, 8(2), 13; https://doi.org/10.3390/bdcc8020013 - 26 Jan 2024

Cited by 10 | Viewed by 3394

Abstract

Identifying patient posture while they are lying in bed is an important task in medical applications such as monitoring a patient after a surgical intervention, sleep supervision to identify behavioral and physiological markers, or for bedsore prevention. An acceptable strategy to identify the patient’s position is the classification of images created from a grid of pressure sensors located in the bed. These samples can be arranged based on supervised learning methods. Usually, image conditioning is required before images are loaded into a learning method to increase classification accuracy. However, continuous monitoring of a person requires large amounts of time and computational resources if complex pre-processing algorithms are used. So, the problem is to classify the image posture of patients with different weights, heights, and positions by using minimal sample conditioning for a specific supervised learning method. In this work, it is proposed to identify the patient posture from pressure sensor images by using well-known and simple conditioning techniques and selecting the optimal texture descriptors for the Support Vector Machine (SVM) method. This is in order to obtain the best classification and to avoid image over-processing in the conditioning stage for the SVM. The experimental stages are performed with the color models Red, Green, and Blue (RGB) and Hue, Saturation, and Value (HSV). The results show an increase in accuracy from 86.9% to 92.9% and in kappa value from 0.825 to 0.904 using image conditioning with histogram equalization and a median filter, respectively. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Journal Menu

Journal Browser

Perception and Detection of Intelligent Vision

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (7 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI