Submit to Special Issue Submit Abstract to Special Issue Review for BDCC Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

AI, Computer Vision and Human–Robot Interaction

Print Special Issue Flyer
Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Big Data and Cognitive Computing (ISSN 2504-2289).

Deadline for manuscript submissions: 23 September 2026 | Viewed by 5728

Share This Special Issue

Editors

Dr. Hannan Azhar

E-Mail Website
Guest Editor

Computing, AI and Cyber Security, Canterbury Christ Church University, Canterbury, UK
Interests: artificial intelligence; computer vision; explainable AI; transformer-based models; assistive robotics; ambient assisted living; autonomous systems
Special Issues, Collections and Topics in MDPI journals

Dr. Md Nazmul Huda

E-Mail Website
Guest Editor

Department of Electronic and Electrical Engineering, Brunel University London, Kingston Ln, Uxbridge UB8 3PH, UK
Interests: robotics; computer vision; mobile capsule robots; capsule endoscopy; artificial intelligence; deep learn-ing; sensor fusion; autonomous vehicles; search and rescue robots
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Hongying Meng

E-Mail Website
Guest Editor

Department of Electronic and Electrical Engineering, Brunel University of London, London UB8 3PH, UK
Interests: biomedical signal processing; machine learning; image processing; human–computer interaction; embedded systems and communications
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The convergence of Artificial Intelligence (AI), Computer Vision, and Human–Robot Interaction (HRI) is accelerating the development of intelligent, autonomous, and interactive robotic systems. From autonomous navigation and human–robot interaction to environmental sensing and healthcare assistance, robots must operate intelligently and safely in real-world settings. Advances in deep learning, transformer-based models, and sim-to-real learning are driving rapid improvements in perception, planning, and adaptability. However, these must be complemented by explainability and robust generalization, especially in domains such as AAL, healthcare, swarm-based sensing, and field robotics, where reliability and trust are critical. This Special Issue highlights recent advances that empower robots to perceive, interpret, and act reliably in complex, dynamic, and human-centered environments. Core technologies of interest include transformer-based architectures, multimodal learning, transfer learning, and Explainable AI (XAI), each contributing to greater adaptability, transparency, and trust in real-world robotic applications.

We welcome contributions ranging from novel algorithms to integrated frameworks and toolkits that support development, reproducibility, and deployment. Particular interest lies in applied research across impactful domains such as Ambient Assisted Living (AAL), healthcare, autonomous off-road navigation, and swarm robotics, where coordination, safety, and interpretability are essential to success.

This Special Issue invites interdisciplinary research on perception, learning, and interaction in intelligent robotic systems. We welcome both algorithmic innovations and deployed systems that bridge core AI with practical robotic applications.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but are not limited to) the following:

Transformer-based models and attention mechanisms in robotic perception;
Multimodal learning and sensor fusion across vision, language, and tactile inputs;
Transfer learning and domain adaptation for visual and behavioral generalization;
Explainable AI (XAI) in robotic systems, especially in safety-critical domains like healthcare, AAL, and swarm robotics;
Vision-based human–robot interaction, including gesture, emotion, and behavior recognition;
Reinforcement and imitation learning for adaptive robotic behavior;
Sim-to-real transfer for deploying learned models in physical systems;
Vision-based navigation, terrain understanding, and autonomous exploration in unstructured environments;
Swarm robotics and AI coordination for collaborative sensing and decision-making;
Collaborative, assistive, and human-in-the-loop robotic systems;
Frameworks, toolkits, or platforms supporting perception, learning, or explainability;
Ethical and socially responsible AI in robotic applications.

We look forward to receiving your contributions.

Dr. Hannan Azhar
Dr. Md Nazmul Huda
Prof. Dr. Hongying Meng
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-anonymized peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

artificial intelligence
computer vision
explainable AI (XAI)
transformer-based models
transfer learning
sim-to-real learning
human–robot interaction
swarm robotics
autonomous navigation
assistive robotics

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

31 pages, 20126 KB

Open AccessArticle

Visual Safe Human-to-Humanoid Motion Imitation

by Wenqi Cai, John Abanes, Nikolaos Evangeliou and Anthony Tzes

Big Data Cogn. Comput. 2026, 10(8), 246; https://doi.org/10.3390/bdcc10080246 (registering DOI) - 24 Jul 2026

Abstract

Safe human-to-humanoid motion imitation is crucial for shared environments, where direct motion retargeting may induce self-collision or human–humanoid collision due to embodiment mismatch, kinematic limits, perception uncertainty, and human proximity. This paper presents an online vision-aided safe human-to-humanoid motion imitation framework that integrates skeleton-based upper-body pose estimation, joint-space retargeting, and capsule-based Control Barrier Function Quadratic Program (CBF-QP) safety filtering. Human skeletal observations are mapped to a reduced eight-degree-of-freedom (8-DoF) humanoid upper-body command, while the CBF-QP layer computes a safety-corrected target that minimally modifies the nominal imitation command subject to robot self-collision and human–humanoid collision constraints. The framework is evaluated via simulation and hardware experiments under representative self-collision and human–humanoid interaction episodes, complemented by a comparative benchmark against velocity damping and potential field baselines. Furthermore, this work introduces an evaluation protocol combining geometric safety, command deviation, and local-link similarity metrics to systematically characterize the safety–imitation trade-off governed by CBF parameters. The results demonstrate that, within the tested moderate-speed regime, the proposed framework substantially reduces geometric collision violations while balancing imitation fidelity with online computational feasibility, thereby providing a viable foundation for safe human-guided humanoid motion deployment. Full article

(This article belongs to the Special Issue AI, Computer Vision and Human–Robot Interaction)

► Show Figures

Figure 1

25 pages, 2280 KB

Open AccessArticle

Economical, Optimal and Uncertain Multiple-View L₂ Triangulation via LMIs

by Graziano Chesi

Big Data Cogn. Comput. 2026, 10(7), 222; https://doi.org/10.3390/bdcc10070222 - 5 Jul 2026

Viewed by 201

Abstract

This paper proposes a novel approach for multiple-view

L_{2}

triangulation, a key problem in computer vision which consists of estimating a scene point from its estimated image projections on two or more cameras and from the estimated projection matrices of the cameras [...] Read more.

This paper proposes a novel approach for multiple-view

L_{2}

L_{2}

norm. In the proposed approach, the estimated image projections are allowed to be uncertain in admissible regions described by polynomial inequalities and equalities, and an estimate of the scene point is obtained by solving a linear matrix inequality (LMI) problem built with matrix decompositions, polynomial multipliers, and the Gram matrix method. It is proven that the optimal estimate can always be achieved by using multipliers with sufficiently large degree. Moreover, a simple test is provided in order to establish the optimality of the obtained estimate. As shown by some examples with real and synthetic data, the proposed approach presents key advantages with respect to several existing methods of a different nature, which may fail to find the optimal estimate, may not allow one to establish the optimality of the found estimate, or may require a larger computational burden. Full article

(This article belongs to the Special Issue AI, Computer Vision and Human–Robot Interaction)

► Show Figures

Figure 1

32 pages, 44770 KB

Open AccessArticle

Recognition of Acupoints on Human Back Based on Machine Vision and Deep Learning

by Zhike Zhao, Linman Song, Songying Li, Ruihao Xue and Peng Li

Big Data Cogn. Comput. 2026, 10(7), 204; https://doi.org/10.3390/bdcc10070204 - 23 Jun 2026

Viewed by 388

Abstract

Traditional acupoint localization methods rely heavily on manual operation, resulting in high subjectivity and limited accuracy. To improve the precision and stability of acupoint detection, this study integrates machine vision technology with in situ projection to achieve automated recognition and real-time visualization of human acupoints. First, an automatic calibration method based on image processing is proposed for back acupoints. Spinal features are extracted from the blue channel, enhanced using adaptive histogram equalization, and processed through region of interest extraction, minimum-threshold binarization, and morphological operations. Key spinal curve points are then fitted using Bézier functions. Canny edge detection is used to extract the human silhouette, locate the acromion, and derive the pixel scale of the “cun” measurement, enabling coordinate computation for 141 back acupoints. In the deep learning component, an improved YOLOv8-Pose model is developed for acupoint localization. Unlike existing methods that use local attention or the original Object Keypoint Similarity (OKS) loss, we introduce two innovations: a non-local attention module for global dependency modeling, and a novel Efficient Object Keypoint Similarity (EOKS) loss function that incorporates geometric constraints—namely, width, height, and center distance—in addition to Euclidean distance. A non-local attention mechanism is incorporated into the backbone to enhance global feature extraction, and the EOKS loss function is designed to improve spatiogeometric regression accuracy. An inference mechanism is further introduced to derive the remaining acupoints from 49 detected keypoints; experiments demonstrate that the improved model achieves 95.0% detection accuracy, outperforming the baseline by 2.62%, with an inference time of 14.5 ms. Finally, an in situ projection platform is constructed, combining camera calibration, four-point proportional scaling, and an OpenCV 4.5.4-based interactive interface. The system supports real-time translation, rotation, and scaling, enabling accurate projection of detected acupoints onto the human body. Full article

(This article belongs to the Special Issue AI, Computer Vision and Human–Robot Interaction)

► Show Figures

Figure 1

25 pages, 16570 KB

Open AccessArticle

Effective Flow Ratio: A Novel Efficiency Metric for Heterogeneous Traffic in a Signalized Urban Intersection with Aerial Computer Vision

by Abu Anas Ibn Samad, Tanvir Ahmed and Md Nazmul Huda

Big Data Cogn. Comput. 2026, 10(3), 80; https://doi.org/10.3390/bdcc10030080 - 6 Mar 2026

Viewed by 1290

Abstract

Intelligent Transportation Systems (ITS) primarily rely on flow rate and occupancy to estimate traffic states. However, in heterogeneous traffic conditions characterized by weak lane discipline and diverse vehicle classes, these conventional metrics fail to capture the true operational efficiency of signalized intersections. High flow rates can mask underlying inefficiencies, while low flow rates do not necessarily indicate free-flow conditions. This paper introduces a novel computer vision-based metric, the Effective Flow Ratio (EFR), designed to quantify the actual discharge efficiency of mixed traffic. By leveraging Bird’s-Eye View (BEV) vehicle tracking using You Only Look Once version 11 (YOLOv11) and ByteTrack, EFR distinguishes between kinematic movement and effective discharge, resolving the ambiguity of “moving but not clearing” states. We analyze 21 days of continuous footage from a rooftop-mounted camera overlooking a congested intersection in Dhaka, Bangladesh, exhibiting distinct non-linear behaviors compared to raw flow counts. Our results demonstrate that: (i) Flow rate and discharge efficiency are dynamically decoupled, evidenced by significant variance in EFR within identical flow bins; (ii) Temporal rolling correlations reveal transient regimes where traditional signal control logic would misinterpret congestion severity; and (iii) EFR provides a more robust proxy for intersection performance than occupancy or volume alone. The proposed metric offers a granular, physics-informed input for next-generation adaptive traffic signal control in developing urban environments. Full article

(This article belongs to the Special Issue AI, Computer Vision and Human–Robot Interaction)

► Show Figures

Figure 1

22 pages, 44814 KB

Open AccessArticle

Unordered Stacked Pillbox Detection Algorithm Based on Improved YOLOv8

by Jiahang Pan, Rui Zhou, Jie Feng, Mincheng Wu, Xiang Wu and Hui Dong

Big Data Cogn. Comput. 2025, 9(12), 300; https://doi.org/10.3390/bdcc9120300 - 26 Nov 2025

Cited by 2 | Viewed by 1223

Abstract

To enable fully automated medicine warehousing in intelligent pharmacy systems, accurately detecting disordered, stacked pillboxes is essential. This paper proposes a high-precision detection algorithm for such scenarios based on an improved YOLOv8 framework. The proposed method integrates a novel convolutional module that replaces traditional stride convolutions and pooling layers, enhancing the detection of small, low-resolution targets in computer vision tasks. To further enhance detection accuracy, the Bi-Level Routing Attention (BiFormer) Vision Transformer is incorporated as a Cognitive Computing module. Additionally, the circular Smooth Label (CSL) technique is employed to mitigate boundary discontinuities and periodic anomalies in angle prediction, which often arise in the detection of rotated objects. The experimental results demonstrate that the proposed method achieves a precision of 94.24%, a recall of 90.39%, and a mean average precision (mAP) of 94.16%—improvements of 3.34%, 2.53%, and 3.35%, respectively, over the baseline YOLOv8 model. Moreover, the enhanced detection model outperforms existing rotated-object detection methods while maintaining real-time inference speed. To facilitate reproducibility and future benchmarking, the full dataset and source code used in this study have been released publicly. Although no standardized benchmark currently exists for pillbox detection, our self-constructed dataset reflects key industrial variations in pillbox size, orientation, and stacking, thereby providing a foundation for future cross-domain validation. Full article

(This article belongs to the Special Issue AI, Computer Vision and Human–Robot Interaction)

► Show Figures

Figure 1

15 pages, 1265 KB

Open AccessArticle

Lightweight Multimodal Adapter for Visual Object Tracking

by Vasyl Borsuk, Vitaliy Yakovyna and Nataliya Shakhovska

Big Data Cogn. Comput. 2025, 9(11), 292; https://doi.org/10.3390/bdcc9110292 - 15 Nov 2025

Cited by 1 | Viewed by 1629

Abstract

Visual object tracking is a fundamental computer vision task recently extended to multimodal settings, where natural language descriptions complement visual information. Existing multimodal trackers typically rely on large-scale transformer architectures that jointly train visual and textual encoders, resulting in hundreds of millions of trainable parameters and substantial computational overhead. We propose a lightweight multimodal adapter that integrates textual descriptions into a state-of-the-art visual-only framework with minimal overhead. The pretrained visual and text encoders are frozen, and only a small projection network is trained to align text embeddings with visual features. The adapter is modular, can be toggled at inference, and has negligible impact on speed. Extensive experiments demonstrate that textual cues improve tracking robustness and enable efficient multimodal integration with over 100× fewer trainable parameters than heavy multimodal trackers, allowing training and deployment on resource-limited devices. Full article

(This article belongs to the Special Issue AI, Computer Vision and Human–Robot Interaction)

► Show Figures

Journal Menu

Journal Browser

AI, Computer Vision and Human–Robot Interaction

Share This Special Issue

Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (6 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI