MDPI - Publisher of Open Access Journals

27 pages, 1766 KiB

Open AccessArticle

A Novel Optimized Hybrid Deep Learning Framework for Mental Stress Detection Using Electroencephalography

by Maithili Shailesh Andhare, T. Vijayan, B. Karthik and Shabana Urooj

Brain Sci. 2025, 15(8), 835; https://doi.org/10.3390/brainsci15080835 (registering DOI) - 4 Aug 2025

Mental stress is a psychological or emotional strain that typically occurs because of threatening, challenging, and overwhelming conditions and affects human behavior. Various factors, such as professional, environmental, and personal pressures, often trigger it. In recent years, various deep learning (DL)-based schemes using [...] Read more.

Mental stress is a psychological or emotional strain that typically occurs because of threatening, challenging, and overwhelming conditions and affects human behavior. Various factors, such as professional, environmental, and personal pressures, often trigger it. In recent years, various deep learning (DL)-based schemes using electroencephalograms (EEGs) have been proposed. However, the effectiveness of DL-based schemes is challenging because of the intricate DL structure, class imbalance problems, poor feature representation, low-frequency resolution problems, and complexity of multi-channel signal processing. This paper presents a novel hybrid DL framework, BDDNet, which combines a deep convolutional neural network (DCNN), bidirectional long short-term memory (BiLSTM), and deep belief network (DBN). BDDNet provides superior spectral–temporal feature depiction and better long-term dependency on the local and global features of EEGs. BDDNet accepts multiple EEG features (MEFs) that provide the spectral and time-domain features of EEGs. A novel improved crow search algorithm (ICSA) was presented for channel selection to minimize the computational complexity of multichannel stress detection. Further, the novel employee optimization algorithm (EOA) is utilized for the hyper-parameter optimization of hybrid BDDNet to enhance the training performance. The outcomes of the novel BDDNet were assessed using a public DEAP dataset. The BDDNet-ICSA offers improved recall of 97.6%, precision of 97.6%, F1-score of 97.6%, selectivity of 96.9%, negative predictive value NPV of 96.9%, and accuracy of 97.3% to traditional techniques. Full article

(This article belongs to the Section Computational Neuroscience, Neuroinformatics, and Neurocomputing)

► Show Figures

Figure 1

49 pages, 5272 KiB

Open AccessArticle

Redefining Urban Boundaries for Health Planning Through an Equity Lens: A Socio-Demographic Spatial Analysis Model in the City of Rome

by Elena Mazzalai, Susanna Caminada, Lorenzo Paglione and Livia Maria Salvatori

Land 2025, 14(8), 1574; https://doi.org/10.3390/land14081574 - 31 Jul 2025

Viewed by 159

Abstract

Urban health planning requires a multi-scalar understanding of the territory, capable of capturing socio-economic inequalities and health needs at the local level. In the case of Rome, current administrative subdivisions—Urban Zones (Zone Urbanistiche)—are too large and internally heterogeneous to serve as [...] Read more.

Urban health planning requires a multi-scalar understanding of the territory, capable of capturing socio-economic inequalities and health needs at the local level. In the case of Rome, current administrative subdivisions—Urban Zones (Zone Urbanistiche)—are too large and internally heterogeneous to serve as effective units for equitable health planning. This study presents a methodology for the territorial redefinition of Rome’s Municipality III, aimed at supporting healthcare planning through an integrated analysis of census sections. These were grouped using a combination of census-based socio-demographic indicators (educational attainment, employment status, single-person households) and real estate values (OMI data), alongside administrative and road network data. The resulting territorial units—21 newly defined Mesoareas—are smaller than Urban Zones but larger than individual census sections and correspond to socio-territorially homogeneous neighborhoods; this structure enables a more nuanced spatial understanding of health-related inequalities. The proposed model is replicable, adaptable to other urban contexts, and offers a solid analytical basis for more equitable and targeted health planning, as well as for broader urban policy interventions aimed at promoting spatial justice. Full article

(This article belongs to the Special Issue Integrating Spatial Analysis and Regional Science to Guide Urban Planning)

25 pages, 2082 KiB

Open AccessArticle

XTTS-Based Data Augmentation for Profanity Keyword Recognition in Low-Resource Speech Scenarios

by Shin-Chi Lai, Yi-Chang Zhu, Szu-Ting Wang, Yen-Ching Chang, Ying-Hsiu Hung, Jhen-Kai Tang and Wen-Kai Tsai

Appl. Syst. Innov. 2025, 8(4), 108; https://doi.org/10.3390/asi8040108 - 31 Jul 2025

Viewed by 123

Abstract

As voice cloning technology rapidly advances, the risk of personal voices being misused by malicious actors for fraud or other illegal activities has significantly increased, making the collection of speech data increasingly challenging. To address this issue, this study proposes a data augmentation [...] Read more.

As voice cloning technology rapidly advances, the risk of personal voices being misused by malicious actors for fraud or other illegal activities has significantly increased, making the collection of speech data increasingly challenging. To address this issue, this study proposes a data augmentation method based on XText-to-Speech (XTTS) synthesis to tackle the challenges of small-sample, multi-class speech recognition, using profanity as a case study to achieve high-accuracy keyword recognition. Two models were therefore evaluated: a CNN model (Proposed-I) and a CNN-Transformer hybrid model (Proposed-II). Proposed-I leverages local feature extraction, improving accuracy on a real human speech (RHS) test set from 55.35% without augmentation to 80.36% with XTTS-enhanced data. Proposed-II integrates CNN’s local feature extraction with Transformer’s long-range dependency modeling, further boosting test set accuracy to 88.90% while reducing the parameter count by approximately 41%, significantly enhancing computational efficiency. Compared to a previously proposed incremental architecture, the Proposed-II model achieves an 8.49% higher accuracy while reducing parameters by about 98.81% and MACs by about 98.97%, demonstrating exceptional resource efficiency. By utilizing XTTS and public corpora to generate a novel keyword speech dataset, this study enhances sample diversity and reduces reliance on large-scale original speech data. Experimental analysis reveals that an optimal synthetic-to-real speech ratio of 1:5 significantly improves the overall system accuracy, effectively addressing data scarcity. Additionally, the Proposed-I and Proposed-II models achieve accuracies of 97.54% and 98.66%, respectively, in distinguishing real from synthetic speech, demonstrating their strong potential for speech security and anti-spoofing applications. Full article

(This article belongs to the Special Issue Advancements in Deep Learning and Its Applications)

18 pages, 2335 KiB

Open AccessArticle

MLLM-Search: A Zero-Shot Approach to Finding People Using Multimodal Large Language Models

by Angus Fung, Aaron Hao Tan, Haitong Wang, Bensiyon Benhabib and Goldie Nejat

Robotics 2025, 14(8), 102; https://doi.org/10.3390/robotics14080102 - 28 Jul 2025

Viewed by 286

Abstract

Robotic search of people in human-centered environments, including healthcare settings, is challenging, as autonomous robots need to locate people without complete or any prior knowledge of their schedules, plans, or locations. Furthermore, robots need to be able to adapt to real-time events that [...] Read more.

Robotic search of people in human-centered environments, including healthcare settings, is challenging, as autonomous robots need to locate people without complete or any prior knowledge of their schedules, plans, or locations. Furthermore, robots need to be able to adapt to real-time events that can influence a person’s plan in an environment. In this paper, we present MLLM-Search, a novel zero-shot person search architecture that leverages multimodal large language models (MLLM) to address the mobile robot problem of searching for a person under event-driven scenarios with varying user schedules. Our approach introduces a novel visual prompting method to provide robots with spatial understanding of the environment by generating a spatially grounded waypoint map, representing navigable waypoints using a topological graph and regions by semantic labels. This is incorporated into an MLLM with a region planner that selects the next search region based on the semantic relevance to the search scenario and a waypoint planner that generates a search path by considering the semantically relevant objects and the local spatial context through our unique spatial chain-of-thought prompting approach. Extensive 3D photorealistic experiments were conducted to validate the performance of MLLM-Search in searching for a person with a changing schedule in different environments. An ablation study was also conducted to validate the main design choices of MLLM-Search. Furthermore, a comparison study with state-of-the-art search methods demonstrated that MLLM-Search outperforms existing methods with respect to search efficiency. Real-world experiments with a mobile robot in a multi-room floor of a building showed that MLLM-Search was able to generalize to new and unseen environments. Full article

(This article belongs to the Section Intelligent Robots and Mechatronics)

► Show Figures

Figure 1

52 pages, 3733 KiB

Open AccessArticle

A Hybrid Deep Reinforcement Learning and Metaheuristic Framework for Heritage Tourism Route Optimization in Warin Chamrap’s Old Town

by Rapeepan Pitakaso, Thanatkij Srichok, Surajet Khonjun, Natthapong Nanthasamroeng, Arunrat Sawettham, Paweena Khampukka, Sairoong Dinkoksung, Kanya Jungvimut, Ganokgarn Jirasirilerd, Chawapot Supasarn, Pornpimol Mongkhonngam and Yong Boonarree

Heritage 2025, 8(8), 301; https://doi.org/10.3390/heritage8080301 - 28 Jul 2025

Viewed by 488

Abstract

Designing optimal heritage tourism routes in secondary cities involves complex trade-offs between cultural richness, travel time, carbon emissions, spatial coherence, and group satisfaction. This study addresses the Personalized Group Trip Design Problem (PGTDP) under real-world constraints by proposing DRL–IMVO–GAN—a hybrid multi-objective optimization framework [...] Read more.

Designing optimal heritage tourism routes in secondary cities involves complex trade-offs between cultural richness, travel time, carbon emissions, spatial coherence, and group satisfaction. This study addresses the Personalized Group Trip Design Problem (PGTDP) under real-world constraints by proposing DRL–IMVO–GAN—a hybrid multi-objective optimization framework that integrates Deep Reinforcement Learning (DRL) for policy-guided initialization, an Improved Multiverse Optimizer (IMVO) for global search, and a Generative Adversarial Network (GAN) for local refinement and solution diversity. The model operates within a digital twin of Warin Chamrap’s old town, leveraging 92 POIs, congestion heatmaps, and behaviorally clustered tourist profiles. The proposed method was benchmarked against seven state-of-the-art techniques, including PSO + DRL, Genetic Algorithm with Multi-Neighborhood Search (Genetic + MNS), Dual-ACO, ALNS-ASP, and others. Results demonstrate that DRL–IMVO–GAN consistently dominates across key metrics. Under equal-objective weighting, it attained the highest heritage score (74.2), shortest travel time (21.3 min), and top satisfaction score (17.5 out of 18), along with the highest hypervolume (0.85) and Pareto Coverage Ratio (0.95). Beyond performance, the framework exhibits strong generalization in zero- and few-shot scenarios, adapting to unseen POIs, modified constraints, and new user profiles without retraining. These findings underscore the method’s robustness, behavioral coherence, and interpretability—positioning it as a scalable, intelligent decision-support tool for sustainable and user-centered cultural tourism planning in secondary cities. Full article

(This article belongs to the Special Issue AI and the Future of Cultural Heritage)

► Show Figures

Figure 1

25 pages, 1330 KiB

Open AccessReview

Cardioprotection Reloaded: Reflections on 40 Years of Research

by Pasquale Pagliaro, Giuseppe Alloatti and Claudia Penna

Antioxidants 2025, 14(7), 889; https://doi.org/10.3390/antiox14070889 - 18 Jul 2025

Viewed by 674

Abstract

Over the past four decades, cardioprotective research has revealed an extraordinary complexity of cellular and molecular mechanisms capable of mitigating ischemia/reperfusion injury (IRI). Among these, ischemic conditioning has emerged as one of the most influential discoveries: brief episodes of ischemia followed by reperfusion [...] Read more.

Over the past four decades, cardioprotective research has revealed an extraordinary complexity of cellular and molecular mechanisms capable of mitigating ischemia/reperfusion injury (IRI). Among these, ischemic conditioning has emerged as one of the most influential discoveries: brief episodes of ischemia followed by reperfusion activate protective programs that reduce myocardial damage. These effects can be elicited locally (pre- or postconditioning) or remotely (remote conditioning), acting mainly through paracrine signaling and mitochondria-linked kinase pathways, with both early and delayed windows of protection. We have contributed to clarifying the roles of mitochondria, oxidative stress, prosurvival kinases, connexins, extracellular vesicles, and sterile inflammation, particularly via activation of the NLRP3 inflammasome. Despite robust preclinical evidence, clinical translation of these approaches has remained disappointing. The challenges largely stem from experimental models that poorly reflect real-world clinical settings—such as advanced age, comorbidities, and multidrug therapy—as well as the reliance on surrogate endpoints that do not reliably predict clinical outcomes. Nevertheless, interest in multi-target protective strategies remains strong. New lines of investigation are focusing on emerging mediators—such as gasotransmitters, extracellular vesicles, and endogenous peptides—as well as targeted modulation of inflammatory responses. Future perspectives point toward personalized cardioprotection tailored to patient metabolic and immune profiles, with special attention to high-risk populations in whom IRI continues to represent a major clinical challenge. Full article

► Show Figures

Figure 1

33 pages, 15612 KiB

Open AccessArticle

A Personalized Multimodal Federated Learning Framework for Skin Cancer Diagnosis

by Shuhuan Fan, Awais Ahmed, Xiaoyang Zeng, Rui Xi and Mengshu Hou

Electronics 2025, 14(14), 2880; https://doi.org/10.3390/electronics14142880 - 18 Jul 2025

Viewed by 325

Abstract

Skin cancer is one of the most prevalent forms of cancer worldwide, and early and accurate diagnosis critically impacts patient outcomes. Given the sensitive nature of medical data and its fragmented distribution across institutions (data silos), privacy-preserving collaborative learning is essential to enable [...] Read more.

Skin cancer is one of the most prevalent forms of cancer worldwide, and early and accurate diagnosis critically impacts patient outcomes. Given the sensitive nature of medical data and its fragmented distribution across institutions (data silos), privacy-preserving collaborative learning is essential to enable knowledge-sharing without compromising patient confidentiality. While federated learning (FL) offers a promising solution, existing methods struggle with heterogeneous and missing modalities across institutions, which reduce the diagnostic accuracy. To address these challenges, we propose an effective and flexible Personalized Multimodal Federated Learning framework (PMM-FL), which enables efficient cross-client knowledge transfer while maintaining personalized performance under heterogeneous and incomplete modality conditions. Our study contains three key contributions: (1) A hierarchical aggregation strategy that decouples multi-module aggregation from local deployment via global modular-separated aggregation and local client fine-tuning. Unlike conventional FL (which synchronizes all parameters in each round), our method adopts a frequency-adaptive synchronization mechanism, updating parameters based on their stability and functional roles. (2) A multimodal fusion approach based on multitask learning, integrating learnable modality imputation and attention-based feature fusion to handle missing modalities. (3) A custom dataset combining multi-year International Skin Imaging Collaboration(ISIC) challenge data (2018–2024) to ensure comprehensive coverage of diverse skin cancer types. We evaluate PMM-FL through diverse experiment settings, demonstrating its effectiveness in heterogeneous and incomplete modality federated learning settings, achieving 92.32% diagnostic accuracy with only a 2% drop in accuracy under 30% modality missingness, with a 32.9% communication overhead decline compared with baseline FL methods. Full article

(This article belongs to the Special Issue Multimodal Learning and Transfer Learning)

► Show Figures

Figure 1

20 pages, 2285 KiB

Open AccessArticle

WormNet: A Multi-View Network for Silkworm Re-Identification

by Hongkang Shi, Minghui Zhu, Linbo Li, Yong Ma, Jianmei Wu, Jianfei Zhang and Junfeng Gao

Animals 2025, 15(14), 2011; https://doi.org/10.3390/ani15142011 - 8 Jul 2025

Viewed by 218

Abstract

Re-identification (ReID) has been widely applied in person and vehicle recognition tasks. This study extends its application to a novel domain: insect (silkworm) recognition. However, unlike person or vehicle ReID, silkworm ReID presents unique challenges, such as the high similarity between individuals, arbitrary [...] Read more.

Re-identification (ReID) has been widely applied in person and vehicle recognition tasks. This study extends its application to a novel domain: insect (silkworm) recognition. However, unlike person or vehicle ReID, silkworm ReID presents unique challenges, such as the high similarity between individuals, arbitrary poses, and significant background noise. To address these challenges, we propose a multi-view network for silkworm ReID, called WormNet, which is built upon an innovative strategy termed extraction purification extraction interaction. Specifically, we introduce a multi-order feature extraction module that captures a wide range of fine-grained features by utilizing convolutional kernels of varying sizes and parallel cardinality, effectively mitigating issues of high individual similarity and diverse poses. Next, a feature mask module (FMM) is employed to purify the features in the spatial domain, thereby reducing the impact of background interference. To further enhance the data representation capabilities of the network, we propose a channel interaction module (CIM), which combines an efficient channel attention network with global response normalization (GRN) in parallel to recalibrate features, enabling the network to learn crucial information at both the local and global scales. Additionally, we introduce a new silkworm ReID dataset for network training and evaluation. The experimental results demonstrate that WormNet achieves an mAP value of 54.8% and a rank-1 value of 91.4% on the dataset, surpassing both state-of-the-art and related networks. This study offers a valuable reference for ReID in insects and other organisms. Full article

(This article belongs to the Section Animal System and Management)

► Show Figures

Figure 1

27 pages, 16258 KiB

Open AccessArticle

A Blockchain-Based Lightweight Reputation-Aware Electricity Trading Service Recommendation System

by Pingyan Mo, Kai Li, Yongjiao Yang, You Wen and Jinwen Xi

Electronics 2025, 14(13), 2640; https://doi.org/10.3390/electronics14132640 - 30 Jun 2025

Viewed by 259

Abstract

With the continuous expansion of users, businesses, and services in electricity retail trading systems, the demand for personalized recommendations has grown significantly. To address the issue of reduced recommendation accuracy caused by insufficient data in standalone recommendation systems, the academic community has conducted [...] Read more.

With the continuous expansion of users, businesses, and services in electricity retail trading systems, the demand for personalized recommendations has grown significantly. To address the issue of reduced recommendation accuracy caused by insufficient data in standalone recommendation systems, the academic community has conducted in-depth research on distributed recommendation systems. However, this collaborative recommendation environment faces two critical challenges: first, how to effectively protect the privacy of data providers and power users during the recommendation process; second, how to handle the potential presence of malicious data providers who may supply false recommendation data, thereby compromising the system’s reliability. To tackle these challenges, a blockchain-based lightweight reputation-aware electricity retail trading service recommendation (BLR-ERTS) system is proposed, tailored for electricity retail trading scenarios. The system innovatively introduces a recommendation method based on Locality-Sensitive Hashing (LSH) to enhance user privacy protection. Additionally, a reputation management mechanism is designed to identify and mitigate malicious data providers, ensuring the quality and trustworthiness of the recommendations. Through theoretical analysis, the security characteristics and privacy-preserving capabilities of the proposed system are explored. Experimental results show that BLR-ERTS achieves an MAE of 0.52, MSE of 0.275, and RMSE of 0.52 in recommendation accuracy. Compared with existing baseline methods, BLR-ERTS improves MAE, MSE, and RMSE by approximately 13%, 14%, and 13%, respectively. Moreover, the system exhibits 94% efficiency, outperforming comparable approaches by 4–24%, and maintains robustness with only a 30% attack success rate under adversarial conditions. The findings demonstrate that BLR-ERTS not only meets privacy protection requirements but also significantly improves recommendation accuracy and system robustness, making it a highly effective solution in a multi-party collaborative environment. Full article

► Show Figures

Figure 1

22 pages, 1038 KiB

Open AccessArticle

MEFL: Meta-Equilibrize Federated Learning for Imbalanced Data in IoT

by Jialu Tang, Yali Gao, Xiaoyong Li and Jia Jia

Entropy 2025, 27(6), 553; https://doi.org/10.3390/e27060553 - 24 May 2025

Viewed by 438

Abstract

In the Internet of Things (IoT), data distribution among diverse terminals exhibits substantial statistical heterogeneity. This imbalance can lead to skewness and accuracy degradation, ultimately affecting the generalization ability and robustness of Federated Learning (FL) models. Our work addresses these critical challenges by [...] Read more.

In the Internet of Things (IoT), data distribution among diverse terminals exhibits substantial statistical heterogeneity. This imbalance can lead to skewness and accuracy degradation, ultimately affecting the generalization ability and robustness of Federated Learning (FL) models. Our work addresses these critical challenges by proposing a novel method, Meta-Equilibrized Federated Learning (MEFL), which integrates meta-learning with gradient-descent preservation and an equilibrated optimization aggregation mechanism based on gradient similarity and variance weighted adjustment. By alleviating the gradient biases caused by multi-step local updates from the source, MEFL effectively resolves the issues of inconsistency between global and local optimization objectives. MEFL optimizes trade-offs between local and global models, and provides an efficient solution for cross-domain data security deployment in IoT scenarios. Comprehensive experiments conducted on real-world datasets demonstrate that MEFL achieves at least 3.26% improvement in final test accuracy, and substantially lowers communication overhead, compared to the existing state-of-the-art baseline methods. The results demonstrate that MEFL exhibits superior performance and generalization capability in addressing personalization challenges with imbalanced non-IID data distributions. Full article

(This article belongs to the Section Signal and Data Analysis)

► Show Figures

Figure 1

29 pages, 5277 KiB

Open AccessArticle

Personalized Course Recommendation System: A Multi-Model Machine Learning Framework for Academic Success

by Md Sajid Islam and A. S. M. Sanwar Hosen

Digital 2025, 5(2), 17; https://doi.org/10.3390/digital5020017 - 22 May 2025

Viewed by 2244

Abstract

The increasing complexity of academic programs and student needs necessitates personalized, data-driven academic advising. Traditional heuristic-based methods often fail to optimize course selection, leading to inefficient academic planning and delayed graduations. This study introduces a hierarchical multi-model machine learning framework for personalized course [...] Read more.

The increasing complexity of academic programs and student needs necessitates personalized, data-driven academic advising. Traditional heuristic-based methods often fail to optimize course selection, leading to inefficient academic planning and delayed graduations. This study introduces a hierarchical multi-model machine learning framework for personalized course recommendations, integrating five predictive models: Success Probability Model (SPM), Course Fit Score Model (CFSM), Prerequisite Fulfillment Model (PFM), Graduation Priority Model (GPM), and Recommended Load Model (RLM). These models operate independently in a local model framework, generating specialized predictions that are synthesized by a global model framework through a meta-function. The meta-function aggregates predictions to compute a final score for each course and ensures recommendations align with student success probabilities, program requirements, and workload constraints. It enforces key constraints, such as prerequisite satisfaction, workload optimization, and program-specific requirements, refining recommendations to be both academically viable and institutionally compliant. The framework demonstrated strong predictive performance, with root mean squared error values of 0.00956, 0.011713, and 0.005406 for SPM, CFSM, and RLM, respectively. Classification models for PFM and GPM also yielded high accuracy, exceeding 99%. Designed for modularity and adaptability, the framework allows for the integration of additional predictive models and fine-tuning of recommendation priorities to suit institutional needs. This scalable solution enhances academic advising efficiency by transforming granular model predictions into personalized, actionable course recommendations, supporting students in making informed academic decisions. Full article

► Show Figures

Figure 1

37 pages, 2036 KiB

Open AccessArticle

GCN-Transformer: Graph Convolutional Network and Transformer for Multi-Person Pose Forecasting Using Sensor-Based Motion Data

by Romeo Šajina, Goran Oreški and Marina Ivašić-Kos

Sensors 2025, 25(10), 3136; https://doi.org/10.3390/s25103136 - 15 May 2025

Viewed by 1307

Abstract

Multi-person pose forecasting involves predicting the future body poses of multiple individuals over time, involving complex movement dynamics and interaction dependencies. Its relevance spans various fields, including computer vision, robotics, human–computer interaction, and surveillance. This task is particularly important in sensor-driven applications, where [...] Read more.

Multi-person pose forecasting involves predicting the future body poses of multiple individuals over time, involving complex movement dynamics and interaction dependencies. Its relevance spans various fields, including computer vision, robotics, human–computer interaction, and surveillance. This task is particularly important in sensor-driven applications, where motion capture systems, including vision-based sensors and IMUs, provide crucial data for analyzing human movement. This paper introduces GCN-Transformer, a novel model for multi-person pose forecasting that leverages the integration of Graph Convolutional Network and Transformer architectures. We integrated novel loss terms during the training phase to enable the model to learn both interaction dependencies and the trajectories of multiple joints simultaneously. Additionally, we propose a novel pose forecasting evaluation metric called Final Joint Position and Trajectory Error (FJPTE), which assesses both local movement dynamics and global movement errors by considering the final position and the trajectory leading up to it, providing a more comprehensive assessment of movement dynamics. Our model uniquely integrates scene-level graph-based encoding and personalized attention-based decoding, introducing a novel architecture for multi-person pose forecasting that achieves state-of-the-art results across four datasets. The model is trained and evaluated on the CMU-Mocap, MuPoTS-3D, SoMoF Benchmark, and ExPI datasets, which are collected using sensor-based motion capture systems, ensuring its applicability in real-world scenarios. Comprehensive evaluations on the CMU-Mocap, MuPoTS-3D, SoMoF Benchmark, and ExPI datasets demonstrate that the proposed GCN-Transformer model consistently outperforms existing state-of-the-art (SOTA) models according to the VIM and MPJPE metrics. Specifically, based on the MPJPE metric, GCN-Transformer shows a 4.7% improvement over the closest SOTA model on CMU-Mocap, 4.3% improvement over the closest SOTA model on MuPoTS-3D, 5% improvement over the closest SOTA model on the SoMoF Benchmark, and a 2.6% improvement over the closest SOTA model on the ExPI dataset. Unlike other models with performances that fluctuate across datasets, GCN-Transformer performs consistently, proving its robustness in multi-person pose forecasting and providing an excellent foundation for the application of GCN-Transformer in different domains. Full article

(This article belongs to the Special Issue Deep Learning Applications for Pose Estimation and Human Action Recognition—2nd Edition)

► Show Figures

Figure 1

32 pages, 9504 KiB

Open AccessArticle

CSSA-YOLO: Cross-Scale Spatiotemporal Attention Network for Fine-Grained Behavior Recognition in Classroom Environments

by Liuchen Zhou, Xiangpeng Liu, Xiqiang Guan and Yuhua Cheng

Sensors 2025, 25(10), 3132; https://doi.org/10.3390/s25103132 - 15 May 2025

Viewed by 740

Abstract

Under a student-centered educational paradigm, project-based learning (PBL) assessment requires accurate identification of classroom behaviors to facilitate effective teaching evaluations and the implementation of personalized learning strategies. The increasing use of visual and multi-modal sensors in smart classrooms has made it possible to [...] Read more.

Under a student-centered educational paradigm, project-based learning (PBL) assessment requires accurate identification of classroom behaviors to facilitate effective teaching evaluations and the implementation of personalized learning strategies. The increasing use of visual and multi-modal sensors in smart classrooms has made it possible to continuously capture rich behavioral data. However, challenges such as lighting variations, occlusions, and diverse behaviors complicate sensor-based behavior analysis. To address these issues, we introduce CSSA-YOLO, a novel detection network that incorporates cross-scale feature optimization. First, we establish a C2fs module that captures spatiotemporal dependencies in small-scale actions such as hand-raising through hierarchical window attention. Second, a Shuffle Attention mechanism is then integrated into the neck to suppress interference from complex backgrounds, thereby enhancing the model’s ability to focus on relevant features. Finally, to further enhance the network’s ability to detect small targets and complex boundary behaviors, we utilize the WIoU loss function, which dynamically weights gradients to optimize the localization accuracy of occluded targets. Experiments involving the SCB03-S dataset showed that CSSA-YOLO outperforms traditional methods, achieving an mAP₅₀ of 76.0%, surpassing YOLOv8m by 1.2%, particularly in complex background and occlusion scenarios. Furthermore, it reaches 78.31 FPS, meeting the requirements for real-time application. This study offers a reliable solution for precise behavior recognition in classroom settings, supporting the development of intelligent education systems. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

19 pages, 3520 KiB

Open AccessArticle

Multi-Attribute Collaborative Optimization for Multimodal Transportation Based on User Preferences

by Youpeng Lu and Gang Gao

Appl. Sci. 2025, 15(10), 5512; https://doi.org/10.3390/app15105512 - 14 May 2025

Viewed by 418

Abstract

Given the differing interests and demands of various participants in multimodal transportation, this paper proposes a multi-attribute decision-making method driven by user preferences. Firstly, a four-dimensional optimization model is established with the objectives of minimizing transportation costs, transportation time, carbon emissions, and transportation [...] Read more.

Given the differing interests and demands of various participants in multimodal transportation, this paper proposes a multi-attribute decision-making method driven by user preferences. Firstly, a four-dimensional optimization model is established with the objectives of minimizing transportation costs, transportation time, carbon emissions, and transportation risks. Furthermore, considering the practical aspects of transportation, differentiated time window constraints are designed based on the continuous time windows of highway transportation, railway train schedules, and the arrival and departure time characteristics of waterway vessels. In terms of solution methods, an improved Genetic Algorithm (GA) and Aptenodytes Forsteri Optimization (AFO) hybrid algorithm (GA-AFO) is proposed, which introduces GA to generate a high-quality initial population to accelerate convergence. By replacing the traditional gradient estimation strategy with a random mutation strategy based on probability distribution, the local search mechanism of AFO is enhanced. Furthermore, in response to the aforementioned multi-objective problem, a multi-attribute decision-making method is devised to reconcile the subjective preferences of decision makers with objective weights, thereby yielding more scientifically valid decision outcomes. Numerical experiments have shown that the designed hybrid algorithm can quickly find solutions and demonstrates good robustness. The proposed multi-attribute decision-making method is able to generate decision schemes tailored to the preferences of different decision makers, thus providing a scientific basis for the formulation of personalized transportation schemes. Full article

► Show Figures

Figure 1

16 pages, 1659 KiB

Open AccessArticle

DualPose: Dual-Block Transformer Decoder with Contrastive Denoising for Multi-Person Pose Estimation

by Matteo Fincato and Roberto Vezzani

Sensors 2025, 25(10), 2997; https://doi.org/10.3390/s25102997 - 9 May 2025

Viewed by 531

Abstract

Multi-person pose estimation is the task of detecting and regressing the keypoint coordinates of multiple people in a single image. Significant progress has been achieved in recent years, especially with the introduction of transformer-based end-to-end methods. In this paper, we present DualPose, a [...] Read more.

Multi-person pose estimation is the task of detecting and regressing the keypoint coordinates of multiple people in a single image. Significant progress has been achieved in recent years, especially with the introduction of transformer-based end-to-end methods. In this paper, we present DualPose, a novel framework that enhances multi-person pose estimation by leveraging a dual-block transformer decoding architecture. Class prediction and keypoint estimation are split into parallel blocks so each sub-task can be separately improved and the risk of interference is reduced. This architecture improves the precision of keypoint localization and the model’s capacity to accurately classify individuals. To improve model performance, the Keypoint-Block uses parallel processing of self-attentions, providing a novel strategy that improves keypoint localization accuracy and precision. Additionally, DualPose incorporates a contrastive denoising (CDN) mechanism, leveraging positive and negative samples to stabilize training and improve robustness. Thanks to CDN, a variety of training samples are created by introducing controlled noise into the ground truth, improving the model’s ability to discern between valid and incorrect keypoints. DualPose achieves state-of-the-art results outperforming recent end-to-end methods, as shown by extensive experiments on the MS COCO and CrowdPose datasets. The code and pretrained models are publicly available. Full article

(This article belongs to the Special Issue Deep Learning Applications for Pose Estimation and Human Action Recognition—2nd Edition)

► Show Figures

Figure 1

Search Results (206)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (206)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI