In recent years, artificial intelligence (AI) has been embedded in everyday life contexts and has empowered systems with the capacity to learn, adapt, and make decisions autonomously. This transformative technology is transforming the way individuals and teams interact with the world by interfacing tasks in areas that range from healthcare [
1] to music [
2,
3] and agriculture [
4]. As the field of human–AI interaction continues to mature and companies increasingly invest in AI innovations as a core component of their competitive strategy, there is a growing need for novel human-centered AI agents, models, algorithms, tools, and interfaces that can effectively leverage hybrid intelligence solutions [
5]. This Special Issue contributes to this scholarly and professional debate by presenting cutting-edge research on the intersectional space of human–AI interaction, an emerging research domain focused on how humans use AI-infused systems to augment their experiences and achieve greater outcomes based on their generative capacity and contextualized meanings in practical uses [
6]. This Special Issue comprises five original articles and one review that describe important achievements in real-world settings. We guest editors of this Special Issue hope that the selected contributions will foster the development of a research stream and open new avenues for further investigation in the multifaceted domain of human interaction with intelligent systems.
As a response to the growing interest in human–robot interaction (HRI) control interfaces, the first article in this issue, written by Qu, Jarosz, and Sniezynski and titled “Robot Control Platform for Multimodal Interactions with Humans Based on ChatGPT”, explores pre- and post-use perceptions of an HRI control platform that leverages a large language model (LLM) to enable new conversational possibilities in HRI contexts. Using the Pepper humanoid robot, the authors developed and tested a multimodal architecture in an experiment involving 20 participants. The proposed platform was crucial in reducing anxiety levels related to the robot’s communication abilities and behavioral actions. These findings open a new space for investigating different user populations, interface designs, and HRI design implementations that incorporate LLMs and cloud computing in robotics.
The second original article by Liu and co-authors, titled “A Real-Time Detection of Pilot Workload Using Low-Interference Devices”, addresses the problem of flight workload by proposing a real-time ensemble model that detects workload levels from physiological data. The authors tested their model on three different tasks in a cross-pilot experiment and observed an overall benefit of such low-interference approach compared to traditional detection methods that rely on patch-based devices and other invasive techniques. In particular, the findings indicate improvements in real-time prediction accuracy when flight ability metrics are incorporated. Nonetheless, the authors emphasize that further research in real flight settings is necessary to obtain more precise results beyond the simulation devices which still fail to fully capture the real pressure experienced by pilots under realistic flight conditions.
The third study, titled “Open-Source Robotic Study Companion with Multimodal Human–Robot Interaction to Improve the Learning Experience of University Students” and written by Baksh, Zorec, and Kruusamäe, introduces an open-source robot that utilizes the multimodal capabilities of AI to provide personalized learning support. The authors outlined the entire development process which started with the identification of six key requirements derived from the literature. The prototype’s design and implementation incorporated two principles of the Technology Acceptance Model (TAM) and were followed by a user evaluation involving eight students. Participants’ preferences for speech and touch interaction modalities were reported, and the study also highlighted considerations related to privacy and affordability. However, further explorations are needed to assess how this approach can contribute to enhanced learning experiences in cross-cultural settings.
The fourth article by Benito and Barrientos, titled “An Intelligent Human–Machine Interface Architecture for Long-Term Remote Robot Handling in Fusion Reactor Environments”, proposes an adaptive software architecture for remote operations in nuclear fusion environments. The architecture demonstrated positive effects on the autonomous operation of agents under human oversight. The modular flexibility provided by the model, which combines multiple approaches that range from user-focused (anthropomorphic) interfaces to AI-based predictive components, yields system robustness and enables intelligent behaviors tailored to the scenarios at hand. The model also ensures interoperability across different industrial facilities. Therefore, this work contributes to filling the current gap in the literature regarding the lack of standardization and adaptability in remote handling settings, providing a maintainable framework responsive to technological changes.
The fifth original article, written by Zhang and colleagues, titled “Translating Words to Worlds: Zero-Shot Synthesis of 3D Terrain from Textual Descriptions Using Large Language Models”, introduces an LLM-based diffusion method that uses a Gaussian-Voronoi map data structure and a chain-of-thought tree to generate terrain heightmaps. The proposed framework allows terrain outputs to be updated in response to revised textual input. Experimental results demonstrated improvements in terms of depth and realism for text-to-terrain generation. Consequently, this approach represents a valuable resource for text-to-data synthesis in three-dimensional spaces.
The last article is a review by Qiu, Liu, and Zhao, titled “A Review of Brain–Computer Interface-Based Language Decoding: From Signal Interpretation to Intelligent Communication”. The authors examine state-of-the-art approaches in the field of brain–computer interaction (BCI) with a focus on recent advances and emerging possibilities in language decoding. This review emphasizes the interplay between computational methodologies, cognitive models of language processing, and neural language decoding. Moreover, it identifies three key paradigmatic shifts toward more personalized, multimodal, and deep learning (DL)-based systems and architectures. The authors argue that these developments have practical implications for enhancing accuracy and stability, reducing latency, and facilitating user adaptability when engaging with brain–computer interfaces.