AI, Computer Vision and Human–Robot Interaction

Special Issue Editors


E-Mail Website
Guest Editor
Computing, AI and Cyber Security, Canterbury Christ Church University, Canterbury, UK
Interests: artificial intelligence; computer vision; explainable AI; transformer-based models; assistive robotics; ambient assisted living; autonomous systems
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Electronic and Electrical Engineering, Brunel University London, Kingston Ln, Uxbridge UB8 3PH, UK
Interests: robotics; computer vision; mobile capsule robots; capsule endoscopy; artificial intelligence; deep learn-ing; sensor fusion; autonomous vehicles; search and rescue robots
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor

Special Issue Information

Dear Colleagues, 

The convergence of Artificial Intelligence (AI), Computer Vision, and Human–Robot Interaction (HRI) is accelerating the development of intelligent, autonomous, and interactive robotic systems. From autonomous navigation and human–robot interaction to environmental sensing and healthcare assistance, robots must operate intelligently and safely in real-world settings. Advances in deep learning, transformer-based models, and sim-to-real learning are driving rapid improvements in perception, planning, and adaptability. However, these must be complemented by explainability and robust generalization, especially in domains such as AAL, healthcare, swarm-based sensing, and field robotics, where reliability and trust are critical. This Special Issue highlights recent advances that empower robots to perceive, interpret, and act reliably in complex, dynamic, and human-centered environments. Core technologies of interest include transformer-based architectures, multimodal learning, transfer learning, and Explainable AI (XAI), each contributing to greater adaptability, transparency, and trust in real-world robotic applications. 

We welcome contributions ranging from novel algorithms to integrated frameworks and toolkits that support development, reproducibility, and deployment. Particular interest lies in applied research across impactful domains such as Ambient Assisted Living (AAL), healthcare, autonomous off-road navigation, and swarm robotics, where coordination, safety, and interpretability are essential to success. 

This Special Issue invites interdisciplinary research on perception, learning, and interaction in intelligent robotic systems. We welcome both algorithmic innovations and deployed systems that bridge core AI with practical robotic applications.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but are not limited to) the following:

  • Transformer-based models and attention mechanisms in robotic perception;
  • Multimodal learning and sensor fusion across vision, language, and tactile inputs;
  • Transfer learning and domain adaptation for visual and behavioral generalization;
  • Explainable AI (XAI) in robotic systems, especially in safety-critical domains like healthcare, AAL, and swarm robotics;
  • Vision-based human–robot interaction, including gesture, emotion, and behavior recognition;
  • Reinforcement and imitation learning for adaptive robotic behavior;
  • Sim-to-real transfer for deploying learned models in physical systems;
  • Vision-based navigation, terrain understanding, and autonomous exploration in unstructured environments;
  • Swarm robotics and AI coordination for collaborative sensing and decision-making;
  • Collaborative, assistive, and human-in-the-loop robotic systems;
  • Frameworks, toolkits, or platforms supporting perception, learning, or explainability;
  • Ethical and socially responsible AI in robotic applications. 

We look forward to receiving your contributions.  

Dr. Hannan Azhar
Dr. Md Nazmul Huda
Prof. Dr. Hongying Meng
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • computer vision
  • explainable AI (XAI)
  • transformer-based models
  • transfer learning
  • sim-to-real learning
  • human–robot interaction
  • swarm robotics
  • autonomous navigation
  • assistive robotics

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

25 pages, 16570 KB  
Article
Effective Flow Ratio: A Novel Efficiency Metric for Heterogeneous Traffic in a Signalized Urban Intersection with Aerial Computer Vision
by Abu Anas Ibn Samad, Tanvir Ahmed and Md Nazmul Huda
Big Data Cogn. Comput. 2026, 10(3), 80; https://doi.org/10.3390/bdcc10030080 - 6 Mar 2026
Viewed by 825
Abstract
Intelligent Transportation Systems (ITS) primarily rely on flow rate and occupancy to estimate traffic states. However, in heterogeneous traffic conditions characterized by weak lane discipline and diverse vehicle classes, these conventional metrics fail to capture the true operational efficiency of signalized intersections. High [...] Read more.
Intelligent Transportation Systems (ITS) primarily rely on flow rate and occupancy to estimate traffic states. However, in heterogeneous traffic conditions characterized by weak lane discipline and diverse vehicle classes, these conventional metrics fail to capture the true operational efficiency of signalized intersections. High flow rates can mask underlying inefficiencies, while low flow rates do not necessarily indicate free-flow conditions. This paper introduces a novel computer vision-based metric, the Effective Flow Ratio (EFR), designed to quantify the actual discharge efficiency of mixed traffic. By leveraging Bird’s-Eye View (BEV) vehicle tracking using You Only Look Once version 11 (YOLOv11) and ByteTrack, EFR distinguishes between kinematic movement and effective discharge, resolving the ambiguity of “moving but not clearing” states. We analyze 21 days of continuous footage from a rooftop-mounted camera overlooking a congested intersection in Dhaka, Bangladesh, exhibiting distinct non-linear behaviors compared to raw flow counts. Our results demonstrate that: (i) Flow rate and discharge efficiency are dynamically decoupled, evidenced by significant variance in EFR within identical flow bins; (ii) Temporal rolling correlations reveal transient regimes where traditional signal control logic would misinterpret congestion severity; and (iii) EFR provides a more robust proxy for intersection performance than occupancy or volume alone. The proposed metric offers a granular, physics-informed input for next-generation adaptive traffic signal control in developing urban environments. Full article
(This article belongs to the Special Issue AI, Computer Vision and Human–Robot Interaction)
Show Figures

Figure 1

22 pages, 44814 KB  
Article
Unordered Stacked Pillbox Detection Algorithm Based on Improved YOLOv8
by Jiahang Pan, Rui Zhou, Jie Feng, Mincheng Wu, Xiang Wu and Hui Dong
Big Data Cogn. Comput. 2025, 9(12), 300; https://doi.org/10.3390/bdcc9120300 - 26 Nov 2025
Viewed by 973
Abstract
To enable fully automated medicine warehousing in intelligent pharmacy systems, accurately detecting disordered, stacked pillboxes is essential. This paper proposes a high-precision detection algorithm for such scenarios based on an improved YOLOv8 framework. The proposed method integrates a novel convolutional module that replaces [...] Read more.
To enable fully automated medicine warehousing in intelligent pharmacy systems, accurately detecting disordered, stacked pillboxes is essential. This paper proposes a high-precision detection algorithm for such scenarios based on an improved YOLOv8 framework. The proposed method integrates a novel convolutional module that replaces traditional stride convolutions and pooling layers, enhancing the detection of small, low-resolution targets in computer vision tasks. To further enhance detection accuracy, the Bi-Level Routing Attention (BiFormer) Vision Transformer is incorporated as a Cognitive Computing module. Additionally, the circular Smooth Label (CSL) technique is employed to mitigate boundary discontinuities and periodic anomalies in angle prediction, which often arise in the detection of rotated objects. The experimental results demonstrate that the proposed method achieves a precision of 94.24%, a recall of 90.39%, and a mean average precision (mAP) of 94.16%—improvements of 3.34%, 2.53%, and 3.35%, respectively, over the baseline YOLOv8 model. Moreover, the enhanced detection model outperforms existing rotated-object detection methods while maintaining real-time inference speed. To facilitate reproducibility and future benchmarking, the full dataset and source code used in this study have been released publicly. Although no standardized benchmark currently exists for pillbox detection, our self-constructed dataset reflects key industrial variations in pillbox size, orientation, and stacking, thereby providing a foundation for future cross-domain validation. Full article
(This article belongs to the Special Issue AI, Computer Vision and Human–Robot Interaction)
Show Figures

Figure 1

15 pages, 1265 KB  
Article
Lightweight Multimodal Adapter for Visual Object Tracking
by Vasyl Borsuk, Vitaliy Yakovyna and Nataliya Shakhovska
Big Data Cogn. Comput. 2025, 9(11), 292; https://doi.org/10.3390/bdcc9110292 - 15 Nov 2025
Cited by 1 | Viewed by 1480
Abstract
Visual object tracking is a fundamental computer vision task recently extended to multimodal settings, where natural language descriptions complement visual information. Existing multimodal trackers typically rely on large-scale transformer architectures that jointly train visual and textual encoders, resulting in hundreds of millions of [...] Read more.
Visual object tracking is a fundamental computer vision task recently extended to multimodal settings, where natural language descriptions complement visual information. Existing multimodal trackers typically rely on large-scale transformer architectures that jointly train visual and textual encoders, resulting in hundreds of millions of trainable parameters and substantial computational overhead. We propose a lightweight multimodal adapter that integrates textual descriptions into a state-of-the-art visual-only framework with minimal overhead. The pretrained visual and text encoders are frozen, and only a small projection network is trained to align text embeddings with visual features. The adapter is modular, can be toggled at inference, and has negligible impact on speed. Extensive experiments demonstrate that textual cues improve tracking robustness and enable efficient multimodal integration with over 100× fewer trainable parameters than heavy multimodal trackers, allowing training and deployment on resource-limited devices. Full article
(This article belongs to the Special Issue AI, Computer Vision and Human–Robot Interaction)
Show Figures

Figure 1

Back to TopTop