MDPI - Publisher of Open Access Journals

17 pages, 2230 KiB

Open AccessArticle

Enhancing Diffusion-Based Music Generation Performance with LoRA

by Seonpyo Kim, Geonhui Kim, Shoki Yagishita, Daewoon Han, Jeonghyeon Im and Yunsick Sung

Appl. Sci. 2025, 15(15), 8646; https://doi.org/10.3390/app15158646 (registering DOI) - 5 Aug 2025

Recent advancements in generative artificial intelligence have significantly progressed the field of text-to-music generation, enabling users to create music from natural language descriptions. Despite the success of various models, such as MusicLM, MusicGen, and AudioLDM, the current approaches struggle to capture fine-grained genre-specific [...] Read more.

Recent advancements in generative artificial intelligence have significantly progressed the field of text-to-music generation, enabling users to create music from natural language descriptions. Despite the success of various models, such as MusicLM, MusicGen, and AudioLDM, the current approaches struggle to capture fine-grained genre-specific characteristics, precisely control musical attributes, and handle underrepresented cultural data. This paper introduces a novel, lightweight fine-tuning method for the AudioLDM framework using low-rank adaptation (LoRA). By updating only selected attention and projection layers, the proposed method enables efficient adaptation to musical genres with limited data and computational cost. The proposed method enhances controllability over key musical parameters such as rhythm, emotion, and timbre. At the same time, it maintains the overall quality of music generation. This paper represents the first application of LoRA in AudioLDM, offering a scalable solution for fine-grained, genre-aware music generation and customization. The experimental results demonstrate that the proposed method improves the semantic alignment and statistical similarity compared with the baseline. The contrastive language–audio pretraining score increased by 0.0498, indicating enhanced text-music consistency. The kernel audio distance score decreased by 0.8349, reflecting improved similarity to real music distributions. The mean opinion score ranged from 3.5 to 3.8, confirming the perceptual quality of the generated music. Full article

(This article belongs to the Special Issue Recent Advances in AI Convergence: Innovations at the Crossroads of Disciplines)

► Show Figures

Figure 1

23 pages, 1205 KiB

Open AccessArticle

Uncovering Emotional and Identity-Driven Dimensions of Entertainment Consumption in a Transitional Digital Culture

by Ștefan Bulboacă, Gabriel Brătucu, Eliza Ciobanu, Ioana Bianca Chițu, Cristinel Petrișor Constantin and Radu Constantin Lixăndroiu

Behav. Sci. 2025, 15(8), 1049; https://doi.org/10.3390/bs15081049 - 1 Aug 2025

Viewed by 258

Abstract

This study explores entertainment consumption patterns in Romania, a transitional digital culture characterized by high digital connectivity but underdeveloped physical infrastructure. Employing a dual qualitative coding methodology, this research combines inductive analysis of consumer focus groups with deductive analysis of expert interviews, enabling [...] Read more.

This study explores entertainment consumption patterns in Romania, a transitional digital culture characterized by high digital connectivity but underdeveloped physical infrastructure. Employing a dual qualitative coding methodology, this research combines inductive analysis of consumer focus groups with deductive analysis of expert interviews, enabling a multi-layered interpretation of both overt behaviors and latent emotional drivers. Seven key thematic dimensions, motivational depth, perceived barriers, emotional needs, clarity of preferences, future behavioral intentions, social connection, and identity construction, were analyzed and compared using a Likert-based scoring framework, supported by a radar chart and comparison matrix. Findings reveal both convergence and divergence between consumer and expert perspectives. While consumers emphasize immediate experiences and logistical constraints, experts uncover deeper emotional motivators such as validation, mentorship, and identity formation. This behavioral–emotional gap suggests that, although digital entertainment dominates due to accessibility, it often lacks the emotional richness associated with physical formats, which are preferred but less accessible. This study underscores the importance of triangulated qualitative inquiry in revealing not only stated preferences but also unconscious psychological needs. It offers actionable insights for designing emotionally intelligent and culturally responsive entertainment strategies in digitally saturated yet infrastructure-limited environments. Full article

(This article belongs to the Special Issue The Emotional Antecedents and Consequences of Buying and Consuming: A Multidisciplinary Perspective on Consumers’ Emotions—Second Edition)

► Show Figures

Figure 1

23 pages, 978 KiB

Open AccessArticle

Emotional Analysis in a Morphologically Rich Language: Enhancing Machine Learning with Psychological Feature Lexicons

by Ron Keinan, Efraim Margalit and Dan Bouhnik

Electronics 2025, 14(15), 3067; https://doi.org/10.3390/electronics14153067 - 31 Jul 2025

Viewed by 264

Abstract

This paper explores emotional analysis in Hebrew texts, focusing on improving machine learning techniques for depression detection by integrating psychological feature lexicons. Hebrew’s complex morphology makes emotional analysis challenging, and this study seeks to address that by combining traditional machine learning methods with [...] Read more.

This paper explores emotional analysis in Hebrew texts, focusing on improving machine learning techniques for depression detection by integrating psychological feature lexicons. Hebrew’s complex morphology makes emotional analysis challenging, and this study seeks to address that by combining traditional machine learning methods with sentiment lexicons. The dataset consists of over 350,000 posts from 25,000 users on the health-focused social network “Camoni” from 2010 to 2021. Various machine learning models—SVM, Random Forest, Logistic Regression, and Multi-Layer Perceptron—were used, alongside ensemble techniques like Bagging, Boosting, and Stacking. TF-IDF was applied for feature selection, with word and character n-grams, and pre-processing steps like punctuation removal, stop word elimination, and lemmatization were performed to handle Hebrew’s linguistic complexity. The models were enriched with sentiment lexicons curated by professional psychologists. The study demonstrates that integrating sentiment lexicons significantly improves classification accuracy. Specific lexicons—such as those for negative and positive emojis, hostile words, anxiety words, and no-trust words—were particularly effective in enhancing model performance. Our best model classified depression with an accuracy of 84.1%. These findings offer insights into depression detection, suggesting that practitioners in mental health and social work can improve their machine learning models for detecting depression in online discourse by incorporating emotion-based lexicons. The societal impact of this work lies in its potential to improve the detection of depression in online Hebrew discourse, offering more accurate and efficient methods for mental health interventions in online communities. Full article

(This article belongs to the Special Issue Techniques and Applications of Multimodal Data Fusion)

► Show Figures

Figure 1

20 pages, 890 KiB

Open AccessArticle

Enhancing Cultural Sustainability in Ethnographic Museums: A Multi-Dimensional Visitor Experience Framework Based on Analytic Hierarchy Process (AHP)

by Chao Ruan, Suhui Qiu and Hang Yao

Sustainability 2025, 17(15), 6915; https://doi.org/10.3390/su17156915 - 30 Jul 2025

Viewed by 417

Abstract

This study examines how a visitor-centered approach enhances engagement, participation, and intangible heritage transmission to support cultural sustainability in ethnographic museums. We conducted online and on-site behavioral observations, questionnaire surveys, and in-depth interviews at the She Ethnic Minority Museum to identify gaps in [...] Read more.

This study examines how a visitor-centered approach enhances engagement, participation, and intangible heritage transmission to support cultural sustainability in ethnographic museums. We conducted online and on-site behavioral observations, questionnaire surveys, and in-depth interviews at the She Ethnic Minority Museum to identify gaps in current visitor experience design. We combined the Analytic Hierarchy Process (AHP) with the Contextual Model of Learning (POE) and Emotional Experience Theory (EET) to develop a hierarchical evaluation model. The model comprises one goal layer, three criterion layers (Experience, Participation, Transmission), and twelve sub-criteria, each evaluated across People, Object, and Environment dimensions. Quantitative weighting revealed that participation exerts the greatest influence, followed by transmission and experience. Findings indicate that targeted interventions promoting active participation most effectively foster emotional resonance and heritage transmission, while strategies supporting intergenerational engagement and immersive experiences also play a significant role. We recommend prioritizing small-scale, low-cost participatory initiatives and integrating online and offline community engagement to establish a participatory chain where engagement leads to meaningful experiences and sustained cultural transmission. These insights offer practical guidance for museum practitioners and policymakers seeking to enhance visitor experiences and ensure the long-term preservation and vibrancy of ethnic minority cultural heritage. Full article

(This article belongs to the Section Tourism, Culture, and Heritage)

► Show Figures

Figure 1

24 pages, 2281 KiB

Open AccessArticle

Multilayer Network Modeling for Brand Knowledge Discovery: Integrating TF-IDF and TextRank in Heterogeneous Semantic Space

by Peng Xu, Rixu Zang, Zongshui Wang and Zhuo Sun

Information 2025, 16(7), 614; https://doi.org/10.3390/info16070614 - 17 Jul 2025

Viewed by 233

Abstract

In the era of homogenized competition, brand knowledge has become a critical factor that influences consumer purchasing decisions. However, traditional single-layer network models fail to capture the multi-dimensional semantic relationships embedded in brand-related textual data. To address this gap, this study proposes a [...] Read more.

In the era of homogenized competition, brand knowledge has become a critical factor that influences consumer purchasing decisions. However, traditional single-layer network models fail to capture the multi-dimensional semantic relationships embedded in brand-related textual data. To address this gap, this study proposes a BKMN framework integrating TF-IDF and TextRank algorithms for comprehensive brand knowledge discovery. By analyzing 19,875 consumer reviews of a mobile phone brand from JD website, we constructed a tri-layer network comprising TF-IDF-derived keywords, TextRank-derived keywords, and their overlapping nodes. The model incorporates co-occurrence matrices and centrality metrics (degree, closeness, betweenness, eigenvector) to identify semantic hubs and interlayer associations. The results reveal that consumers prioritize attributes such as “camera performance”, “operational speed”, “screen quality”, and “battery life”. Notably, the overlap layer exhibits the highest node centrality, indicating convergent consumer focus across algorithms. The network demonstrates small-world characteristics (average path length = 1.627) with strong clustering (average clustering coefficient = 0.848), reflecting cohesive consumer discourse around key features. Meanwhile, this study proposes the Mul-LSTM model for sentiment analysis of reviews, achieving a 93% sentiment classification accuracy, revealing that consumers have a higher proportion of positive attitudes towards the brand’s cell phones, which provides a quantitative basis for enterprises to understand users’ emotional tendencies and optimize brand word-of-mouth management. This research advances brand knowledge modeling by synergizing heterogeneous algorithms and multilayer network analysis. Its practical implications include enabling enterprises to pinpoint competitive differentiators and optimize marketing strategies. Future work could extend the framework to incorporate sentiment dynamics and cross-domain applications in smart home or cosmetic industries. Full article

► Show Figures

Figure 1

16 pages, 1351 KiB

Open AccessArticle

A Comparative Study on Machine Learning Methods for EEG-Based Human Emotion Recognition

by Shokoufeh Davarzani, Simin Masihi, Masoud Panahi, Abdulrahman Olalekan Yusuf and Massood Atashbar

Electronics 2025, 14(14), 2744; https://doi.org/10.3390/electronics14142744 - 8 Jul 2025

Viewed by 466

Abstract

Electroencephalogram (EEG) signals provide a direct and non-invasive means of interpreting brain activity and are increasingly becoming valuable in embedded emotion-aware systems, particularly for applications in healthcare, wearable electronics, and human–machine interactions. Among various EEG-based emotion recognition techniques, deep learning methods have demonstrated [...] Read more.

Electroencephalogram (EEG) signals provide a direct and non-invasive means of interpreting brain activity and are increasingly becoming valuable in embedded emotion-aware systems, particularly for applications in healthcare, wearable electronics, and human–machine interactions. Among various EEG-based emotion recognition techniques, deep learning methods have demonstrated superior performance compared to traditional approaches. This advantage stems from their ability to extract complex features—such as spectral–spatial connectivity, temporal dynamics, and non-linear patterns—from raw EEG data, leading to a more accurate and robust representation of emotional states and better adaptation to diverse data characteristics. This study explores and compares deep and shallow neural networks for human emotion recognition from raw EEG data, with the goal of enabling real-time processing in embedded and edge-deployable systems. Deep learning models—specifically convolutional neural networks (CNNs) and recurrent neural networks (RNNs)—have been benchmarked against traditional approaches such as the multi-layer perceptron (MLP), support vector machine (SVM), and k-nearest neighbors (kNN) algorithms. This comparative study investigates the effectiveness of deep learning techniques in EEG-based emotion recognition by classifying emotions into four categories based on the valence–arousal plane: high arousal, positive valence (HAPV); low arousal, positive valence (LAPV); high arousal, negative valence (HANV); and low arousal, negative valence (LANV). Evaluations were conducted using the DEAP dataset. The results indicate that both the CNN and RNN-STM models have a high classification performance in EEG-based emotion recognition, with an average accuracy of 90.13% and 93.36%, respectively, significantly outperforming shallow algorithms (MLP, SVM, kNN). Full article

(This article belongs to the Special Issue New Advances in Embedded Software and Applications)

► Show Figures

Figure 1

26 pages, 4157 KiB

Open AccessArticle

Cultural and Ekistic Heritage of Princes’ Islands: A Study on Halki and Its Enhancement Through Augmented Reality

by Anna Chatsiopoulou, Vasilis Dimitriadis, Maria Panakaki, Eleni G. Gavra, Nikolaos Liazos and Panagiotis D. Michailidis

Heritage 2025, 8(7), 243; https://doi.org/10.3390/heritage8070243 - 23 Jun 2025

Viewed by 559

Abstract

This study aims to photograph, design, and digitally document the surviving residential buildings on the island of Halki (Heybeliada), within the Princes’ Islands. This documentation focuses on the architectural, urban, and historical aspects of Halki, highlighting the significant material evidence of the Greek [...] Read more.

This study aims to photograph, design, and digitally document the surviving residential buildings on the island of Halki (Heybeliada), within the Princes’ Islands. This documentation focuses on the architectural, urban, and historical aspects of Halki, highlighting the significant material evidence of the Greek social and economic presence. It also examines the urban cultural heritage as depicted in Turkish literature of that period to understand how Turkish writers perceived and presented Halki, referencing the Princes’ Islands only for background context. The methodology includes the collection of material from residents through bibliographic and field research conducted on Halki. Based on these findings, a mobile augmented reality (AR) application was developed using the TaleBlazer platform, designed specifically for use on Halki. The application provides a virtual tour with multimedia-supported thematic layers of architectural and historical information. Its usability and learnability were evaluated using a questionnaire completed by students. The results showed high usability, user satisfaction, and perceived value of learning, with the majority of results close to a median score of 4 out of 5. The students identified the occurrence of immersive experience, ease of use, and the emotional stimulation created by the integration of spatial storytelling and multimedia. This paper also shows how the convergence of cultural content (history, architecture, and literature) can enhance interpretations and experiences with mobile AR technologies. Full article

► Show Figures

Figure 1

18 pages, 4253 KiB

Open AccessArticle

The Emotional Landscape of Technological Innovation: A Data-Driven Case Study of ChatGPT’s Launch

by Lowri Williams and Pete Burnap

Informatics 2025, 12(3), 58; https://doi.org/10.3390/informatics12030058 - 22 Jun 2025

Viewed by 728

Abstract

The rapid development and deployment of artificial intelligence (AI) technologies have sparked intense public interest and debate. While these innovations promise to revolutionise various aspects of human life, it is crucial to understand the complex emotional responses they elicit from potential adopters and [...] Read more.

The rapid development and deployment of artificial intelligence (AI) technologies have sparked intense public interest and debate. While these innovations promise to revolutionise various aspects of human life, it is crucial to understand the complex emotional responses they elicit from potential adopters and users. Such findings can offer crucial guidance for stakeholders involved in the development, implementation, and governance of AI technologies like OpenAI’s ChatGPT, a large language model (LLM) that garnered significant attention upon its release, enabling more informed decision-making regarding potential challenges and opportunities. While previous studies have employed data-driven approaches towards investigating public reactions to emerging technologies, they often relied on sentiment polarity analysis, which categorises responses as positive or negative. However, this binary approach fails to capture the nuanced emotional landscape surrounding technological adoption. This paper overcomes this limitation by presenting a comprehensive analysis for investigating the emotional landscape surrounding technology adoption by using the launch of ChatGPT as a case study. In particular, a large corpus of social media texts containing references to ChatGPT was compiled. Text mining techniques were applied to extract emotions, capturing a more nuanced and multifaceted representation of public reactions. This approach allows the identification of specific emotions such as excitement, fear, surprise, and frustration, providing deeper insights into user acceptance, integration, and potential adoption of the technology. By analysing this emotional landscape, we aim to provide a more comprehensive understanding of the factors influencing ChatGPT’s reception and potential long-term impact. Furthermore, we employ topic modelling to identify and extract the common themes discussed across the dataset. This additional layer of analysis allows us to understand the specific aspects of ChatGPT driving different emotional responses. By linking emotions to particular topics, we gain a more contextual understanding of public reaction, which can inform decision-making processes in the development, deployment, and regulation of AI technologies. Full article

(This article belongs to the Section Big Data Mining and Analytics)

► Show Figures

Figure 1

20 pages, 1371 KiB

Open AccessArticle

EEG Emotion Recognition Using AttGraph: A Multi-Dimensional Attention-Based Dynamic Graph Convolutional Network

by Shuai Zhang, Chengxi Chu, Xin Zhang and Xiu Zhang

Brain Sci. 2025, 15(6), 615; https://doi.org/10.3390/brainsci15060615 - 7 Jun 2025

Viewed by 675

Abstract

Background/Objectives: Electroencephalogram (EEG) signals, which reflect brain activity, are widely used in emotion recognition. However, the variety of EEG features presents significant challenges in identifying key features, reducing redundancy, and simplifying the computational process. Methods: To address these challenges, this paper proposes a [...] Read more.

Background/Objectives: Electroencephalogram (EEG) signals, which reflect brain activity, are widely used in emotion recognition. However, the variety of EEG features presents significant challenges in identifying key features, reducing redundancy, and simplifying the computational process. Methods: To address these challenges, this paper proposes a multi-dimensional attention-based dynamic graph convolutional neural network (AttGraph) model. The model delves into the impact of different EEG features on emotion recognition by evaluating their sensitivity to emotional changes, providing richer and more accurate feature information. Results: Through the dynamic weighting of EEG features via a multi-dimensional attention convolution layer, the AttGraph method is able to precisely detect emotional changes and automatically choose the most discriminative features for emotion recognition tasks. This approach significantly improves the model’s recognition accuracy and robustness. Finally, subject-independent and subject-dependent experiments were conducted on two public datasets. Conclusions: Through comparisons and analyses with existing methods, the proposed AttGraph method demonstrated outstanding performances in emotion recognition tasks, with stronger generalization ability and adaptability. Full article

(This article belongs to the Section Computational Neuroscience, Neuroinformatics, and Neurocomputing)

► Show Figures

Figure 1

26 pages, 8111 KiB

Open AccessArticle

Spatial Perception: How Paper Art Realizes the Expansion Design of Urban Spaces

by Dingwei Zhang, Xiaotong Zhang and Hongtao Zhou

Buildings 2025, 15(12), 1967; https://doi.org/10.3390/buildings15121967 - 6 Jun 2025

Viewed by 658

Abstract

Aiming at the problems of insufficient function, cultural aphasia, and blunted perception faced by contemporary urban public space, this study explores the potential of paper-based materials in enhancing spatial quality and realizing spatial expansion effects, providing new solutions for urban renewal. Taking the [...] Read more.

Aiming at the problems of insufficient function, cultural aphasia, and blunted perception faced by contemporary urban public space, this study explores the potential of paper-based materials in enhancing spatial quality and realizing spatial expansion effects, providing new solutions for urban renewal. Taking the sensory plasticity, visual aesthetics, cultural carrying, and ecological and environmental protection of paper materials as the entry point, we constructed a theoretical model of “paper art space expansion”. Through the design intervention strategy, we explored the application of paper art in the design of interface, space, art creation, and cultural empowerment from visual and tactile perspectives. Through course design, artist interviews, and questionnaire analysis, the study shows that (1) paper material can achieve a balance between function and aesthetics through multi-dimensional design strategies; (2) its environmental attributes and emotional healing value can effectively enhance the emotional connection between people and space; and (3) the contemporary translation of paper art provides an important path for cultural empowerment. This study forms a three-dimensional design framework of “Perception Layer-Technology Layer-Cultural Layer” and proposes a set of innovative models for the application of paper materials in contemporary art and space design, which can provide support for the expansion of space and the increase in content. Future research will focus on the transition of paper art from decoration to the design paradigm of the cultural narrative of intelligent space, deepening the value of paper material as an ecological, cultural, and technological medium, and open up a new direction for the theory and practice of spatial design. At the same time, more attention will be paid to the exploration of the possibility of sensory healing for the blind and other special populations. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

► Show Figures

Figure 1

21 pages, 283 KiB

Open AccessArticle

Benefits, Challenges, and Steps Forward on Using Poetry Workshops in Interdisciplinary Migration Research: Reflections from the Field and Methodological Insights

by Nikola Lero, Jasmin Donlic and Marjan Marjanović

Societies 2025, 15(6), 158; https://doi.org/10.3390/soc15060158 - 6 Jun 2025

Viewed by 1448

Abstract

This article offers a critical methodological reflection on the use of poetry workshops in migration research, positioning them as empowering, ethically complex, yet powerful research tools for studying migrant experience. While arts-based methods have gained momentum, their application often lacks critical reflexivity regarding [...] Read more.

This article offers a critical methodological reflection on the use of poetry workshops in migration research, positioning them as empowering, ethically complex, yet powerful research tools for studying migrant experience. While arts-based methods have gained momentum, their application often lacks critical reflexivity regarding their benefits, challenges, and interdisciplinary potential. Drawing on implementing and designing over 50 poetry workshops facilitated by the author across Bosnian/Yugoslav, U.S., and U.K. diaspora contexts, this paper employs an autoethnographic and participatory lens to explore the workshops’ dual role as sites of empowerment and tools for epistemic transformation. Beyond examining their use in participatory action research (PAR), the paper highlights how poetry workshops can serve as interdisciplinary research tools that capture not only emotional and narrative dimensions of displacement but also spatial and material aspects of migrant experience. In doing so, the paper contributes to a broader rethinking of qualitative migration research by integrating methods from the social-oriented to spatial-oriented disciplines. Ultimately, it calls for a shift from viewing poetry as an extractive technique to embracing it as a reflexive, practical research method, capable of producing richly layered, interdisciplinary knowledge about transnational migrant lives. Full article

(This article belongs to the Special Issue Doing and Critically Evaluating Participatory Action Research in Migration Studies)

22 pages, 4051 KiB

Open AccessArticle

Optimizing an LSTM Self-Attention Architecture for Portuguese Sentiment Analysis Using a Genetic Algorithm

by Daniel Parada, Alexandre Branco, Marcos Silva, Fábio Mendonça, Sheikh Mostafa and Fernando Morgado-Dias

Appl. Sci. 2025, 15(11), 6336; https://doi.org/10.3390/app15116336 - 5 Jun 2025

Viewed by 461

Abstract

A sentiment analysis is a Natural Language Processing (NLP) task that identifies the opinion or emotional tone of documents such as customer reviews, either at the general or detailed level. Improving domain-specific models is important, as it provides smaller and better-suited models that [...] Read more.

A sentiment analysis is a Natural Language Processing (NLP) task that identifies the opinion or emotional tone of documents such as customer reviews, either at the general or detailed level. Improving domain-specific models is important, as it provides smaller and better-suited models that can be implemented by entities that own textual data. This paper presents a deep learning model trained on Portuguese restaurant reviews using recurrent and self-attention mechanisms, which have consistently delivered strong results in prior research studies. Designing an effective model involves numerous hyperparameters and architectural choices. To address this complexity, a discrete genetic algorithm was used to find an optimal configuration, selecting the layer types, placement of self-attention, dropout rate, and model dimensions and shape. A key outcome of this study was that the optimization process produced a model that is competitive with a Bidirectional Encoder Representation from Transformers (BERT) model retrained for Portuguese, which was used as the baseline. The proposed model achieved an area under the curve of 92.1% and F1-score of 75.4%, demonstrating that a small, optimized model can compete and even outperform larger state-of-the-art models. Moreover, this work helps address the scarcity of NLP resources for Portuguese, and highlights the potential of customized architectures over generic solutions. Full article

► Show Figures

Figure 1

22 pages, 17966 KiB

Open AccessArticle

CTIFERK: A Thermal Infrared Facial Expression Recognition Model with Kolmogorov–Arnold Networks for Smart Classrooms

by Zhaoyu Shou, Yongsheng Tang, Dongxu Li, Jianwen Mo and Cheng Feng

Symmetry 2025, 17(6), 864; https://doi.org/10.3390/sym17060864 - 2 Jun 2025

Viewed by 490

Abstract

Accurate recognition of student emotions in smart classrooms is vital for understanding learning states. Visible light-based facial expression recognition is often affected by illumination changes, making thermal infrared imaging a promising alternative due to its robust temperature distribution symmetry. This paper proposes CTIFERK, [...] Read more.

Accurate recognition of student emotions in smart classrooms is vital for understanding learning states. Visible light-based facial expression recognition is often affected by illumination changes, making thermal infrared imaging a promising alternative due to its robust temperature distribution symmetry. This paper proposes CTIFERK, a thermal infrared facial expression recognition model integrating Kolmogorov–Arnold Networks (KANs). By incorporating multiple KAN layers, CTIFERK enhances feature extraction and fitting capabilities. It also balances pooling layer information from the MobileViT backbone to preserve symmetrical facial features, improving recognition accuracy. Experiments on the Tufts Face Database, the IRIS Database, and the self-constructed GUET thermalface dataset show that CTIFERK achieves accuracies of 81.82%, 82.19%, and 65.22%, respectively, outperforming baseline models. These results validate CTIFERK’s effectiveness and superiority for thermal infrared expression recognition in smart classrooms, enabling reliable emotion monitoring. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

35 pages, 5913 KiB

Open AccessArticle

Embedding Fear in Medical AI: A Risk-Averse Framework for Safety and Ethics

by Andrej Thurzo and Vladimír Thurzo

AI 2025, 6(5), 101; https://doi.org/10.3390/ai6050101 - 14 May 2025

Viewed by 1891

Abstract

In today’s high-stakes arenas—from healthcare to defense—algorithms are advancing at an unprecedented pace, yet they still lack a crucial element of human decision-making: an instinctive caution that helps prevent harm. Inspired by both the protective reflexes seen in military robotics and the human [...] Read more.

In today’s high-stakes arenas—from healthcare to defense—algorithms are advancing at an unprecedented pace, yet they still lack a crucial element of human decision-making: an instinctive caution that helps prevent harm. Inspired by both the protective reflexes seen in military robotics and the human amygdala’s role in threat detection, we introduce a novel idea: an integrated module that acts as an internal “caution system”. This module does not experience emotion in the human sense; rather, it serves as an embedded safeguard that continuously assesses uncertainty and triggers protective measures whenever potential dangers arise. Our proposed framework combines several established techniques. It uses Bayesian methods to continuously estimate the likelihood of adverse outcomes, applies reinforcement learning strategies with penalties for choices that might lead to harmful results, and incorporates layers of human oversight to review decisions when needed. The result is a system that mirrors the prudence and measured judgment of experienced clinicians—hesitating and recalibrating its actions when the data are ambiguous, much like a doctor would rely on both intuition and expertise to prevent errors. We call on computer scientists, healthcare professionals, and policymakers to collaborate in refining and testing this approach. Through joint research, pilot projects, and robust regulatory guidelines, we aim to ensure that advanced computational systems can combine speed and precision with an inherent predisposition toward protecting human life. Ultimately, by embedding this cautionary module, the framework is expected to significantly reduce AI-induced risks and enhance patient safety and trust in medical AI systems. It seems inevitable for future superintelligent AI systems in medicine to possess emotion-like processes. Full article

(This article belongs to the Special Issue Artificial Intelligence in Biomedical Engineering: Challenges and Developments)

► Show Figures

Figure 1

26 pages, 7054 KiB

Open AccessArticle

An Ensemble of Convolutional Neural Networks for Sound Event Detection

by Abdinabi Mukhamadiyev, Ilyos Khujayarov, Dilorom Nabieva and Jinsoo Cho

Mathematics 2025, 13(9), 1502; https://doi.org/10.3390/math13091502 - 1 May 2025

Viewed by 1093

Abstract

Sound event detection tasks are rapidly advancing in the field of pattern recognition, and deep learning methods are particularly well suited for such tasks. One of the important directions in this field is to detect the sounds of emotional events around residential buildings [...] Read more.

Sound event detection tasks are rapidly advancing in the field of pattern recognition, and deep learning methods are particularly well suited for such tasks. One of the important directions in this field is to detect the sounds of emotional events around residential buildings in smart cities and quickly assess the situation for security purposes. This research presents a comprehensive study of an ensemble convolutional recurrent neural network (CRNN) model designed for sound event detection (SED) in residential and public safety contexts. The work focuses on extracting meaningful features from audio signals using image-based representation, such as Discrete Cosine Transform (DCT) spectrograms, Cocheagrams, and Mel spectrograms, to enhance robustness against noise and improve feature extraction. In collaboration with police officers, a two-hour dataset consisting of 112 clips related to four classes of emotional sounds, such as harassment, quarrels, screams, and breaking sounds, was prepared. In addition to the crowdsourced dataset, publicly available datasets were used to broaden the study’s applicability. Our dataset contains 5055 audio files of different lengths totaling 14.14 h and strongly labeled data. The dataset consists of 13 separate sound categories. The proposed CRNN model integrates spatial and temporal feature extraction by processing these spectrograms through convolution and bi-directional gated recurrent unit (GRU) layers. An ensemble approach combines predictions from three models, achieving F1 scores of 71.5% for segment-based metrics and 46% for event-based metrics. The results demonstrate the model’s effectiveness in detecting sound events under noisy conditions, even with a small, unbalanced dataset. This research highlights the potential of the model for real-time audio surveillance systems using mini-computers, offering cost-effective and accurate solutions for maintaining public order. Full article

(This article belongs to the Special Issue Advanced Machine Vision with Mathematics)

► Show Figures

Figure 1

Search Results (226)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (226)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI