Multimodal Technologies and Interaction

12 pages, 683 KB

Open AccessArticle

Integrated Hyperparameter Optimization with Dimensionality Reduction and Clustering for Radiomics: A Bootstrapped Approach

by S. J. Pawan, Matthew Muellner, Xiaomeng Lei, Mihir Desai, Bino Varghese, Vinay Duddalwar and Steven Y. Cen

Multimodal Technol. Interact. 2025, 9(5), 49; https://doi.org/10.3390/mti9050049 - 21 May 2025

Cited by 1 | Viewed by 902

Abstract

Radiomics involves extracting quantitative features from medical images, resulting in high-dimensional data. Unsupervised clustering has been used to discover patterns in radiomic features, potentially yielding hidden biological insights. However, its effectiveness depends on the selection of dimensionality reduction techniques, clustering methods, and hyperparameter [...] Read more.

Radiomics involves extracting quantitative features from medical images, resulting in high-dimensional data. Unsupervised clustering has been used to discover patterns in radiomic features, potentially yielding hidden biological insights. However, its effectiveness depends on the selection of dimensionality reduction techniques, clustering methods, and hyperparameter optimization, an area with limited exploration in the literature. We present a novel bootstrapping-based hyperparameter search approach to optimize clustering efficacy, treating dimensionality reduction and clustering as a connected process chain. The hyperparameter search was guided by the Adjusted Rand Index (ARI) and Davies–Bouldin Index (DBI) within a bootstrapping framework of 100 iterations. The cluster assignments were generated through 10-fold cross-validation, and a grid search strategy was used to explore hyperparameter combinations. We evaluated ten unsupervised learning pipelines using both simulation studies and real-world radiomics data derived from multiphase CT images of renal cell carcinoma. In simulations, we found that Non-negative Matrix Factorization (NMF) and Spectral Clustering outperformed the traditional Principal Component Analysis (PCA)-based approach. The best-performing pipeline (NMF followed by K-means clustering) successfully identified all three simulated clusters, achieving a Cramér’s V of 0.9. The simulation also established a reference framework for understanding the concordance patterns among different pipelines under varying strengths of clustering effects. High concordance reflects strong clustering. In the real-world data application, we observed a moderate clustering effect, which aligned with the weak associations to clinical outcomes, as indicated by the highest AUROC of 0.63. Full article

(This article belongs to the Special Issue Artificial Intelligence in Medical Radiation Science, Radiology and Radiation Oncology)

► Show Figures

Figure 1

11 pages, 12478 KB

Open AccessArticle

Computer Vision-Based Obstacle Detection Mobile System for Visually Impaired Individuals

by Gisel Katerine Bastidas-Guacho, Mario Alejandro Paguay Alvarado, Patricio Xavier Moreno-Vallejo, Patricio Rene Moreno-Costales, Nayely Samanta Ocaña Yanza and Jhon Carlos Troya Cuestas

Multimodal Technol. Interact. 2025, 9(5), 48; https://doi.org/10.3390/mti9050048 - 18 May 2025

Cited by 1 | Viewed by 2074

Abstract

Traditional tools, such as canes, are no longer enough to subsist the mobility and orientation of visually impaired people in complex environments. Therefore, technological solutions based on computer vision tasks are presented as promising alternatives to help detect obstacles. Object detection models are [...] Read more.

Traditional tools, such as canes, are no longer enough to subsist the mobility and orientation of visually impaired people in complex environments. Therefore, technological solutions based on computer vision tasks are presented as promising alternatives to help detect obstacles. Object detection models are easy to couple to mobile systems, do not require a large consumption of resources on mobile phones, and act in real-time to alert users of the presence of obstacles. However, existing object detectors were mostly trained with images from platforms such as Kaggle, and the number of existing objects is still limited. For this reason, this study proposes to implement a mobile system that integrates an object detection model for the identification of obstacles intended for visually impaired people. Additionally, the mobile application integrates multimodal feedback through auditory and haptic interaction, ensuring that users receive real-time obstacle alerts via voice guidance and vibrations, further enhancing accessibility and responsiveness in different navigation contexts. The chosen scenario to develop the obstacle detection application is the Specialized Educational Unit Dr. Luis Benavides for impaired people, which is the source of images for building the dataset for the model and evaluating it with impaired individuals. To determine the best model, the performance of YOLO is evaluated by means of a precision adjustment through the variation of epochs, using a proprietary data set of 7600 diverse images. The YOLO-300 model turned out to be the best, with a mAP of 0.42. Full article

► Show Figures

Figure 1

27 pages, 1259 KB

Open AccessArticle

The Influence of the Labeling Effect on the Perception of Command Execution Delay in Gaming

by Duy H. Nguyen and Peter A. Kara

Multimodal Technol. Interact. 2025, 9(5), 47; https://doi.org/10.3390/mti9050047 - 15 May 2025

Viewed by 2510

Abstract

Gaming is one of the largest industries of digital entertainment. Modern gaming software may be susceptible to command execution delay, which may be caused by various factors, such as insufficient rendering capabilities or limited network resources. At the time of writing this paper, [...] Read more.

Gaming is one of the largest industries of digital entertainment. Modern gaming software may be susceptible to command execution delay, which may be caused by various factors, such as insufficient rendering capabilities or limited network resources. At the time of writing this paper, the utilized advances in gaming are often accompanied by brief descriptions when communicated to the users. While such descriptions may be compressed into a couple of words, even a single word may impact user experience. Due to the cognitive bias induced by the labeling effect, the impact of such a word may actually be more significant than what the user genuinely perceives. In this paper, we investigate the influence of the labeling effect on the perception of command execution delay in gaming. We carried out a series of subjective tests to measure how the word “optimized” affects gaming experience. The test variables of our experiment were the added delay between command and execution, the speed of the game, and the label that was assigned to gaming sequences. The test participants were tasked to directly compare gaming sequences with the different labels assigned: “optimized” and “not optimized”. In every comparison, both sequences had the same objective characteristics; only the label differed. The experiment was conducted on single-input and continuous-input computer games that we developed for this research. The obtained results indicate that for both of these input types, the labeling effect has a statistically significant impact on perceived delay. Overall, more than 70% of the subjective ratings were affected by the assigned labels. Moreover, there is a strong correlation between the amount of delay and the effect of cognitive bias. The speed of the game also affected the obtained results, yet statistically significant differences were only measured between the slowest and the fastest gameplay. Full article

► Show Figures

Figure 1

15 pages, 3211 KB

Open AccessArticle

Optimizing HUD-EVS Readability: Effects of Hue, Saturation and Lightness on Information Recognition

by Xuyi Qiu

Multimodal Technol. Interact. 2025, 9(5), 46; https://doi.org/10.3390/mti9050046 - 14 May 2025

Viewed by 909

Abstract

Enhanced Vision System (EVS) offers a display advantage that conventional devices lack, enabling interface information to be overlaid on real-world imagery. However, information overload, especially in complex environments, can reduce the recognizability of important information and impair decision-making. This study investigates a dual [...] Read more.

Enhanced Vision System (EVS) offers a display advantage that conventional devices lack, enabling interface information to be overlaid on real-world imagery. However, information overload, especially in complex environments, can reduce the recognizability of important information and impair decision-making. This study investigates a dual color-coding strategy to optimize the recognizability of Primary Information (PI) and Secondary Information (SI) in Head-Up Display–Enhanced Vision System (HUD-EVS) against complex backgrounds. The results show that adjusting the hue, saturation, and lightness of SI affects the recognizability of both PI and SI. Specifically, certain saturation (20% or 80%) and lightness (60%) combinations should be avoided to ensure PI prominence and maintain sufficient recognizability for SI. These findings provide insights for designing color-coding strategies for EVS, enhancing the recognizability of information on mobile devices. Full article

► Show Figures

Figure 1

34 pages, 10532 KB

Open AccessArticle

Personalized and Timely Feedback in Online Education: Enhancing Learning with Deep Learning and Large Language Models

by Óscar Cuéllar, Manuel Contero and Mauricio Hincapié

Multimodal Technol. Interact. 2025, 9(5), 45; https://doi.org/10.3390/mti9050045 - 14 May 2025

Viewed by 3883

Abstract

This study investigates an Adaptive Feedback System (AFS) that integrates deep learning (a recurrent neural network trained with historical student data) and GPT-4 to provide personalized feedback in a Digital Art course. In a quasi-experimental design, the intervention group (n = 42) [...] Read more.

This study investigates an Adaptive Feedback System (AFS) that integrates deep learning (a recurrent neural network trained with historical student data) and GPT-4 to provide personalized feedback in a Digital Art course. In a quasi-experimental design, the intervention group (n = 42) received weekly feedback generated from model predictions, while the control group (n = 39) followed the same program without this intervention across four learning blocks or levels. The results revealed (1) a cumulative effect with a significant performance difference in the fourth learning block (+12.63 percentage points); (2) a reduction in performance disparities between students with varying levels of prior knowledge in the experimental group (−56.5%) versus an increase in the control group (+103.3%); (3) an “overcoming effect” where up to 42.9% of students surpassed negative performance predictions; and (4) a positive impact on active participation, especially in live class attendance (+30.21 points) and forum activity (+9.79 points). These findings demonstrate that integrating deep learning with LLMs can significantly improve learning outcomes in online educational environments, particularly for students with limited prior knowledge. Full article

(This article belongs to the Special Issue Innovative Theories and Practices for Designing and Evaluating Inclusive Educational Technology and Online Learning)

► Show Figures

Figure 1

20 pages, 1108 KB

Open AccessArticle

LLMs in Education: Evaluation GPT and BERT Models in Student Comment Classification

by Anabel Pilicita and Enrique Barra

Multimodal Technol. Interact. 2025, 9(5), 44; https://doi.org/10.3390/mti9050044 - 12 May 2025

Viewed by 2047

Abstract

The incorporation of artificial intelligence in educational contexts has significantly transformed the support provided to students facing learning difficulties, facilitating both the management of their educational process and their emotions. Additionally, online comments play a vital role in understanding student feelings. Analyzing comments [...] Read more.

The incorporation of artificial intelligence in educational contexts has significantly transformed the support provided to students facing learning difficulties, facilitating both the management of their educational process and their emotions. Additionally, online comments play a vital role in understanding student feelings. Analyzing comments on social media platforms can help identify students in vulnerable situations so that timely interventions can be implemented. However, manually analyzing student-generated content on social media platforms is challenging due to the large amount of data and the frequency with which it is posted. In this sense, the recent revolution in artificial intelligence, marked by the implementation of powerful large language models (LLMs), may contribute to the classification of student comments. This study compared the effectiveness of a supervised learning approach using five different LLMs: bert-base-uncased, roberta-base, gpt-4o-mini-2024-07-18, gpt-3.5-turbo-0125, and gpt-neo-125m. The evaluation was carried out after fine-tuning them specifically to classify student comments on social media platforms with anxiety/depression or neutral labels. The results obtained were as follows: gpt-4o-mini-2024-07-18 and gpt-3.5-turbo-0125 obtained 98.93%, roberta-base 98.14%, bert-base-uncased 97.13%, and gpt-neo-125m 96.43%. Therefore, when comparing the effectiveness of these models, it was determined that all LLMs performed well in this classification task. Full article

► Show Figures

Figure 1

22 pages, 5933 KB

Open AccessArticle

Education 4.0 for Industry 4.0: A Mixed Reality Framework for Workforce Readiness in Manufacturing

by Andrea Bondin and Joseph Paul Zammit

Multimodal Technol. Interact. 2025, 9(5), 43; https://doi.org/10.3390/mti9050043 - 9 May 2025

Cited by 3 | Viewed by 2847

Abstract

The rapid emergence of Industry 4.0 technologies has transformed manufacturing, requiring a workforce skilled in automation, data-driven decision-making, and process optimisation. While traditional education includes structured formats such as lectures and tutorials, it may not always equip graduates with the hands-on expertise demanded [...] Read more.

The rapid emergence of Industry 4.0 technologies has transformed manufacturing, requiring a workforce skilled in automation, data-driven decision-making, and process optimisation. While traditional education includes structured formats such as lectures and tutorials, it may not always equip graduates with the hands-on expertise demanded by modern industrial challenges. This study presents a Mixed Reality (MR)-based educational framework that promotes interactive experiences to enhance students’ engagement with and understanding of Industry 4.0 concepts, aiming to bridge the skills gap through immersive Virtual Learning Factories (VLFs). The framework was developed using a mixed-methods approach, combining qualitative feedback with quantitative benchmarking. A proof-of-concept MR application was developed and tested at the (Anonymised), simulating Industry 4.0 scenarios in an engineering education context to validate the framework. The findings indicate that MR-based learning improved students’ engagement with the academic content, leading to better knowledge retention and deeper conceptual understanding. The students also demonstrated enhanced problem-solving, process optimisation, and adaptability compared to traditional methods. The immersive nature of MR provided an interactive, context-rich environment that fostered active learning. This research highlights MR’s potential as a transformative educational tool, aligning academic training with industry needs. Future research is recommended to evaluate the framework’s scalability and long-term effectiveness. Full article

► Show Figures

Figure 1

25 pages, 1517 KB

Open AccessArticle

Towards Structured Gaze Data Classification: The Gaze Data Clustering Taxonomy (GCT)

by Yahdi Siradj, Kiki Maulana Adhinugraha and Eric Pardede

Multimodal Technol. Interact. 2025, 9(5), 42; https://doi.org/10.3390/mti9050042 - 3 May 2025

Viewed by 1016

Abstract

Gaze data analysis plays a crucial role in understanding human visual attention and behaviour. However, raw gaze data is often noisy and lacks inherent structure, making interpretation challenging. Therefore, preprocessing techniques such as classification are essential to extract meaningful patterns and improve the [...] Read more.

Gaze data analysis plays a crucial role in understanding human visual attention and behaviour. However, raw gaze data is often noisy and lacks inherent structure, making interpretation challenging. Therefore, preprocessing techniques such as classification are essential to extract meaningful patterns and improve the reliability of gaze-based analysis. This study introduces the Gaze Data Clustering Taxonomy (GCT), a novel approach that categorises gaze data into structured clusters to improve its reliability and interpretability. GCT classifies gaze data based on cluster count, target presence, and spatial–temporal relationships, allowing for more precise gaze-to-target association. We utilise several machine learning techniques, such as k-NN, k-Means, and DBScan, to apply the taxonomy to a Random Saccade Task dataset, demonstrating its effectiveness in gaze classification. Our findings highlight how clustering provides a structured approach to gaze data preprocessing by distinguishing meaningful patterns from unreliable data. Full article

► Show Figures

Figure 1

14 pages, 1689 KB

Open AccessArticle

Evaluating the Effectiveness of Tilt Gestures for Text Property Control in Mobile Interfaces

by Sang-Hwan Kim and Xuesen Liu

Multimodal Technol. Interact. 2025, 9(5), 41; https://doi.org/10.3390/mti9050041 - 29 Apr 2025

Viewed by 789

Abstract

The objective of this study is to verify the usability of gesture interactions such as tilting or shaking, rather than conventional touch gestures, on mobile devices. To this end, a prototype was developed that manipulates the text size in a mobile text messaging [...] Read more.

The objective of this study is to verify the usability of gesture interactions such as tilting or shaking, rather than conventional touch gestures, on mobile devices. To this end, a prototype was developed that manipulates the text size in a mobile text messaging application through tilt gestures. In the text input interface, three types of tilt gesture interaction methods (‘Shaking’, ‘Leaning’, and ‘Acceleration’) were implemented to select the text size level among five levels (extra-small, small, normal, large, and extra-large). Along with the gesture-based interaction methods, the conventional button method was also evaluated. A total of 24 participants were asked to prepare text messages of specified font sizes using randomly assigned interaction methods to select the font size. Task completion time, accuracy (setting errors and input errors), workload, and subjective preferences were collected and analyzed. As a result, the ‘Shaking’ method was generally similar to the conventional button method and superior to the other two ‘Leaning’ and ‘Acceleration’ methods. This may be because ‘Leaning’ and ‘Acceleration’ are continuous operations, while ‘Shaking’ is an individual operation for each menu (font size level). According to subjective comments, tilting gestures on mobile devices can not only be useful if users take the time to learn them, but also provide ways to convey intentions with simple text. Although tilting gestures were not found to significantly improve text editing performance compared to conventional screen touch methods, the use of motion gestures beyond touch on mobile devices can be considered for interface manipulations such as app navigation, gaming, or multimedia controls across diverse applications. Full article

► Show Figures

Figure 1

20 pages, 3083 KB

Open AccessArticle

Exploring an Emoji-Based Evaluation and Intervention Method for Psychological Safety in Ongoing Co-Creations

by Qiner Lyu, Gaku Kutsuzawa, Hiroyuki Umemura, Kenta Kimura, Masaaki Mochimaru and Akihiko Murai

Multimodal Technol. Interact. 2025, 9(5), 40; https://doi.org/10.3390/mti9050040 - 24 Apr 2025

Viewed by 2271

Abstract

Psychological safety is pivotal for co-creation to build an open environment where innovative ideas can flourish. Traditionally, psychological safety has been evaluated from a stable and long-term perspective by implementing psychological scales. Consequently, existing interventions often focus on steadily enhancing psychological safety, which [...] Read more.

Psychological safety is pivotal for co-creation to build an open environment where innovative ideas can flourish. Traditionally, psychological safety has been evaluated from a stable and long-term perspective by implementing psychological scales. Consequently, existing interventions often focus on steadily enhancing psychological safety, which is less suitable for dynamic short-term co-creation settings. The purpose of this study is to introduce the use of emojis as a novel and intuitive interaction during co-creations and assess their effectiveness in evaluating and influencing psychological safety. We performed two experiments with 140 participants in total to test emojis as evaluations and interventions, respectively. The participants watched videos and annotated them with emojis based on their perceptions of emotions. This process allowed us to explore the relationship between perceived emotions and psychological safety. In the next phase, we embedded emojis directly into the videos to observe whether the participants’ emotional perceptions—and, consequently, their psychological safety—could be influenced by visual cues. Our findings demonstrate that positive emojis are positively correlated with psychological safety and negative emojis are negatively correlated with psychological safety. We also revealed that negative emojis significantly decreased psychological safety scores, whereas positive emojis did not lead to a corresponding increase. Full article

► Show Figures

Figure 1

22 pages, 4770 KB

Open AccessArticle

Internet of Things and Artificial Intelligence for Secure and Sustainable Green Mobility: A Multimodal Data Fusion Approach to Enhance Efficiency and Security

by Manuel J. C. S. Reis

Multimodal Technol. Interact. 2025, 9(5), 39; https://doi.org/10.3390/mti9050039 - 24 Apr 2025

Cited by 2 | Viewed by 1967

Abstract

The increasing complexity of urban mobility systems demands innovative solutions to address challenges such as traffic congestion, energy inefficiency, and environmental sustainability. This paper proposes an IoT and AI-driven framework for secure and sustainable green mobility, leveraging multimodal data fusion to enhance traffic [...] Read more.

The increasing complexity of urban mobility systems demands innovative solutions to address challenges such as traffic congestion, energy inefficiency, and environmental sustainability. This paper proposes an IoT and AI-driven framework for secure and sustainable green mobility, leveraging multimodal data fusion to enhance traffic management, energy efficiency, and emissions reduction. Using publicly available datasets, including METR-LA for traffic flow and OpenWeatherMap for environmental context, the framework integrates machine learning models for congestion prediction and reinforcement learning for dynamic route optimization. Simulation results demonstrate a 20% reduction in travel time, 15% energy savings per kilometer, and a 10% decrease in CO₂ emissions compared to baseline methods. The modular architecture of the framework allows for scalability and adaptability across various smart city applications, including traffic management, energy grid optimization, and public transit coordination. These findings underscore the potential of IoT and AI technologies to revolutionize urban transportation, contributing to more efficient, secure, and sustainable mobility systems. Full article

► Show Figures

Figure 1

36 pages, 12483 KB

Open AccessArticle

Environments That Boost Creativity: AI-Generated Living Geometry

by Nikos A. Salingaros

Multimodal Technol. Interact. 2025, 9(5), 38; https://doi.org/10.3390/mti9050038 - 23 Apr 2025

Cited by 1 | Viewed by 2268

Abstract

Generative AI leads to designs that prioritize cognition, emotional resonance, and health, thus offering a tested alternative to current trends. In a first AI experiment, the large language model ChatGPT-4o generated six visual environments that are expected to boost creative thinking for their [...] Read more.

Generative AI leads to designs that prioritize cognition, emotional resonance, and health, thus offering a tested alternative to current trends. In a first AI experiment, the large language model ChatGPT-4o generated six visual environments that are expected to boost creative thinking for their occupants. The six test cases are evaluated using Christopher Alexander’s 15 fundamental properties of living geometry as criteria, as well as ChatGPT-4o, to reveal a strong positive correlation. Living geometry is a specific type of geometry that shows coherence across scales, fractal structure, and nested symmetries to harmonize with human neurophysiology. The human need for living geometry is supported by interdisciplinary evidence from biology, environmental psychology, and neuroscience. Then, in a second AI experiment, ChatGPT-4o was asked to generate visual environments that suppress creativity for comparison with the cases that boost creative thinking. Checking these negative examples using Alexander’s 15 fundamental properties, they are almost entirely deficient in living geometry, thus confirming the diagnostic model. Used together with generative AI, living geometry therefore offers a useful method for both creating and evaluating designs based on objective criteria. Adopting a hybrid epistemological framework of AI plus living geometry as a basis for design uncovers a flaw within contemporary architectural practice. Dominant design styles, rooted in untested aesthetic preferences, lack the empirical validation required to address fundamental questions of spatial quality responsible for human creativity. Full article

► Show Figures

Figure 1

36 pages, 18792 KB

Open AccessArticle

VICTORIOUS: A Visual Analytics System for Scoping Review of Document Sets

by Amir Haghighati, Amir Reza Haghverdi and Kamran Sedig

Multimodal Technol. Interact. 2025, 9(5), 37; https://doi.org/10.3390/mti9050037 - 22 Apr 2025

Viewed by 976

Abstract

Scoping review is an iterative knowledge synthesis methodology concerned with broad questions about the nature of a research subject. The increasingly large number of published documents in scholarly domains poses challenges in conducting scoping reviews. Despite attempts to address these challenges, the specific [...] Read more.

Scoping review is an iterative knowledge synthesis methodology concerned with broad questions about the nature of a research subject. The increasingly large number of published documents in scholarly domains poses challenges in conducting scoping reviews. Despite attempts to address these challenges, the specific step of sensemaking in the context of scoping reviews is seldom addressed. We address sensemaking of a curated document collection by developing a VIsual analytiCs sysTem for scOping RevIew of dOcUment Sets (VICTORIOUS). Using known methods within the machine learning community, we propose and develop six modules within VICTORIOUS: Map, Summary, Skim, SemJump, BiblioNetwork, and Compare. To demonstrate the utility of VICTORIOUS, we describe three usage scenarios. We conclude by a qualitative comparison of VICTORIOUS and other available systems. While existing systems leave their users with singular information items regarding a document set and gaining an aggregated assessment in a scoping review is often a challenge, VICTORIOUS shows promise for making sense of documents in a scoping review process. Full article

► Show Figures

Figure 1

Journal Menu

Journal Browser

Multimodal Technol. Interact., Volume 9, Issue 5 (May 2025) – 13 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI