A Novel Hybrid Stacking Ensemble Classifier for the LegUp Robot Used in Lower Limb Rehabilitation

Iordan, Anca-Elena; Covaciu, Florin; Vaida, Calin; Nadas, Iuliu; Banica, Alexandru; Gherman, Bogdan; Ulinici, Ionut; Machado, Jose; Tucan, Paul; Pisla, Doina

doi:10.3390/ai7050177

Open AccessArticle

A Novel Hybrid Stacking Ensemble Classifier for the LegUp Robot Used in Lower Limb Rehabilitation

by

Anca-Elena Iordan

^1,3

,

Florin Covaciu

^2,3

,

Calin Vaida

^2,3,*

,

Iuliu Nadas

^2,3,

Alexandru Banica

^2,3,

Bogdan Gherman

^2,3,*

,

Ionut Ulinici

^2,3,

Jose Machado

^2,3,4

,

Paul Tucan

^2,3

and

Doina Pisla

^2,3,5

¹

Department of Computer Science, Technical University of Cluj-Napoca, 400027 Cluj-Napoca, Romania

²

CESTER—Research Center for Industrial Robots Simulation and Testing, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania

³

European University of Technology, European Union

⁴

MEtRICs Research Centre, School of Engineering, University of Minho, Campus of Azurém, 4800-058 Guimarães, Portugal

⁵

Technical Sciences Academy of Romania, B-dul Dacia, 26, 030167 Bucharest, Romania

^*

Authors to whom correspondence should be addressed.

AI 2026, 7(5), 177; https://doi.org/10.3390/ai7050177 (registering DOI)

Submission received: 5 February 2026 / Revised: 16 April 2026 / Accepted: 7 May 2026 / Published: 21 May 2026

(This article belongs to the Section Medical & Healthcare AI)

Download

Browse Figures

Versions Notes

Abstract

Robust exercise recognition is essential for robot-assisted lower-limb rehabilitation, where misclassifications of sensor-derived movements can degrade therapy execution and supervision. This study proposes a novel hybrid weighted stacking ensemble to increase the efficiency of the intelligent module of the LegUp parallel robotic system for lower limb rehabilitation. The approach combines a Residual Multilayer Perceptron (ResMLP) and an optimized Kernel Extreme Learning Machine (KELM), where model hyperparameters are tuned using Optuna and the base-model probability outputs are fused through optimized weighting and a meta-learner. Experiments were conducted on a five-class dataset built from nine IMU orientation features acquired from three sensors placed on the healthy limb. Four meta-learners were evaluated (Logistic Regression, Random Forest, Gradient Boosting, and AdaBoost), with AdaBoost providing the best overall performance. To further assess the robustness and generalization capability of the proposed approach, a 5-fold cross-validation procedure was performed for the ResMLP, KELM, and the hybrid ensemble models. The proposed stacking hybrid ensemble consistently surpassed the performance of the strongest individual classifiers as well as the original LegUp Multilayer Perceptron model. These results indicate that combining residual learning with kernel-based classification in a weighted stacking framework yields a stable and high-performing solution for multi-class rehabilitation exercise recognition.

Keywords:

Residual Multilayer Perceptron; Kernel Extreme Learning Machine; AdaBoost; hybrid stacking ensemble classifier; parallel robot; lower limb rehabilitation

1. Introduction

Technology-assisted motor rehabilitation has emerged as a major area of interest due to its potential to enhance functional outcomes for patients with locomotor impairments. Robotic systems combined with electromyography (EMG) offer opportunities for personalized and optimized therapy by enabling precise monitoring of muscle activity and real-time adaptation of interventions [1]. Despite the progress achieved, a key technical challenge remains the reliable classification of rehabilitation exercises based on low-dimensional Inertial Measurement Unit (IMU)-derived signals, which are characterized by nonlinear relationships and sensitivity to measurement noise. In this context, the existing LegUp classifier, based on Multilayer Perceptron, exhibits limited representational capacity and may fail to capture complex movement patterns, leading to reduced robustness and occasional misclassifications.

The motivation for choosing this topic arises from the need to develop robust intelligent methods capable of combining the processing speed and generalization ability of machine learning algorithms with the ability of neural networks to capture complex relationships between EMG data and the sensory positions of the lower limbs. Specifically, hybrid architectures that integrate residual neural networks with kernel-based methods such as Kernel Extreme Learning Machine provide an opportunity to exploit the complementary strengths of each approach, thereby enhancing the performance and robustness of classification systems.

The purpose of this study is to design, implement, and evaluate a new hybrid stacking ensemble for the classification of robot-assisted motor exercises, combining a Residual Multilayer Perceptron with KELM. The proposed approach seeks to achieve performance in terms of accuracy, precision, and generalization capability. Furthermore, this intelligent ensemble model is intended to support robotic rehabilitation systems by enhancing exercise recognition, thereby contributing to the advancement of technology-assisted therapy and improving the effectiveness of clinical rehabilitation interventions. The main contribution of this work lies in the development of a hybrid residual–kernel stacking framework tailored for low-dimensional IMU-based rehabilitation data. The study also provides a systematic comparative analysis of multiple MLP architectures and KELM variants, highlighting the advantages of residual learning in this context. In addition, the proposed ensemble is validated through cross-validation and statistical significance testing, ensuring the reliability of the reported performance improvements.

To facilitate a clear and well-structured understanding of the presented research, the article is organized as follows:

The introductory section explains the motivation and context that led to the choice of the research topic. In particular, it highlights the limitations of the current LegUp exercise classifier, which, although effective, relies on a simplified Multilayer Perceptron (MLP) model that may struggle with complex or noisy IMU and EMG signals, leading to occasional misclassifications and reduced adaptability during therapy.
The second section provides an overview of the current state of research and developments in ensemble learning.
The third section presents the architecture of the LegUp system [1,2] and its existing intelligent model, the structure of the dataset used for training and validation, as well as the evaluation metrics applied, together with the Unified Modeling Language (UML) design of the intelligent classification module proposed in this study.
The following section details the performance analysis of the implemented Multilayer Perceptron neural networks, the developed KELM variants, and the hybrid learning ensemble proposed in this study, including a direct comparison with the original LegUp MLP model. In addition, a 5-fold cross-validation procedure is conducted for the ResMLP architecture, the KELM-RBF-Manhattan model, and the hybrid ensemble in order to further assess robustness and generalization performance. The section also reports the results of the Wilcoxon signed-rank test and the Friedman statistical test, used to evaluate both pairwise and overall the significance of the performance differences between the individual models and the proposed ensemble.
Finally, the last section summarizes the main conclusions, emphasizing how the hybrid ensemble improves exercise recognition by enhancing classification accuracy, robustness, and generalization, thereby directly supporting more precise and adaptive robot-assisted rehabilitation.

2. Related Works

In the context of classification, ensemble learning improves the performance of an intelligent model by combining multiple base classifiers, leading to greater robustness and higher accuracy. Mienye et al. [3] discussed real-world applications of ensemble learning across various domains, highlighting its potential to address complex and diverse machine learning problems. Among the analyzed domains, the medical field stands out, where ensemble learning techniques are widely used to build robust classification models with enhanced accuracy.

Korial et al. [4] proposed a cardiovascular disease detection system that combines four base classifiers: Logistic Regression, Random Forest, Naïve Bayes, and K-Nearest Neighbors into a voting ensemble. The ensemble aggregates the individual predictions to generate the final decision, thereby improving robustness compared to the individual classifiers. Due to this ensemble structure combined with feature selection, the system achieves an accuracy of 92.11%, representing an average gain of approximately 2.95% over the best-performing base classifier. Majumder et al. [5] introduced another machine learning-based heart disease prediction ensemble. The ensemble learning structure is a concatenated design, combining two hybrid ensembles: the first includes SVM, Decision Tree, and K-NN, while the second includes Logistic Regression, AdaBoost, and Naïve Bayes. The predictions of the two ensembles are then combined in a voting classifier to produce the final decision. This multi-layer architecture achieves an accuracy of 86.89%, demonstrating the advantage of classifier diversity in improving performance.

Motamedi et al. [6] developed a diabetes diagnostic model based on ensemble learning, named BOGBEnsemble, which uses the GentleBoost algorithm optimized through Bayesian tuning. The architecture is a boosting ensemble with hyperparameters optimized via Bayesian Optimization, combined with a rigorous preprocessing and validation process. The optimized model is validated through cross-validation and compared with other intelligent algorithms. The results show a very high accuracy of approximately 99.26% for diabetes prediction, indicating that this learning ensemble is a reliable tool for diagnostic purposes. A stacked ensemble learning approach was implemented by Arya et al. [7] for breast cancer prognosis prediction, using multi-modal data. In the first stage, a convolutional neural network (CNN) automatically extracts relevant features from the heterogeneous datasets. In the second stage, the extracted features are integrated into a meta-learner that produces the final prediction. The final model achieves an accuracy of approximately 90.2%, demonstrating competitive performance in prognosis of the disease progression. The stacked ensemble structure combines the CNN ability to learn deep representations with the robustness of the Random Forest (RF) classifier used as the meta-model, resulting in more stable and accurate predictions for breast cancer prognosis.

Hemalatha et al. [8] designed a stacked ensemble learning model for stroke prediction, employing four base classifiers: Decision Tree, XGBoost, Random Forest, and Extra Trees. These base models are combined by a linear Support Vector Classifier (SVC) meta-learner, which produces the final prediction. The resulting model achieves an accuracy of 96.35%, significantly higher than the performance of the individual models. Thus, the integration of multiple machine learning algorithms enhances both the robustness and precision of stroke diagnosis prediction. Also, in the context of stroke prediction, Hossain et al. [9] configured an ensemble classifier that combines four algorithms: AdaBoost, Gradient Boosting Machine, Multilayer Perceptron, and Random Forest. The ensemble type used is a voting ensemble, where the four base models generate individual predictions, which are then combined through majority voting. The ensemble model achieves an accuracy of approximately 95% on the secondary dataset and approximately 80.36% on the primary dataset, demonstrating superior performance compared to the individual models while also providing interpretability of the decisions.

For thyroid disease detection, Obaido et al. [10] assembled a machine learning framework that combines filter-based feature selection with a stacked ensemble learning approach. Several base models are used: Logistic Regression, Support Vector Machine (SVM), Decision Tree, Random Forest, and Gradient Boosting, and their predictions are combined by a Logistic Regression meta-learner to leverage the complementary strengths of each model. The proposed model achieved an approximate ROC-AUC (Receiver Operating Characteristic—Area Under the Curve) of 99.9% on the clinical test dataset, indicating capability in distinguishing between patients with and without thyroid disease.

Shakhovska et al. [11] presented a hybrid ensemble model that combines automatic feature selection with a three-level stacked classifier to evaluate COVID-19 severity and post-COVID rehabilitation duration. The stacked model uses weak predictors, while a Random Forest serves as the aggregator model to generalize the results. The hybrid system also incorporates associative rules, which contribute to improving classification quality. The achieved performance is very high, with an accuracy of 0.920 and an F1-score of 0.921, surpassing several reference individual models. Wang et al. [12] proposed a learning ensemble model for predicting motor function recovery in patients with traumatic cervical spinal cord injuries. The model combines multiple base classifiers at the first level, while a second level aggregates their predictions using AdaBoost to generate the final outcome. This two-level structure enables the capture of complex relationships between clinical features and functional results. Evaluations conducted on real clinical data demonstrate high accuracy and robustness, highlighting the utility of the ensemble in supporting rehabilitation decision-making.

Aonpong et al. [13] developed a method based on a ResMLP neural network that, starting from CT images and extracted features, implicitly estimates genetic expression data and uses these estimations to predict lung cancer recurrence. The method operates as a regression model for “genetic estimation,” followed by a classification model that anticipates recurrence. Experimental results show that the model achieves an accuracy of approximately 86.38%, significantly improving recurrence prediction compared to conventional imaging-based methods. Pacal [14] introduced an approach based on a modified Swin Transformer, in which the standard MLP block is replaced with a ResMLP, and the self-attention mechanism is enhanced with a specialized module called HSW MSA to capture both local details and global relationships in MRI images. The model was trained and evaluated on a public dataset of brain images classified into four classes, using transfer learning and data augmentation to improve robustness. The results show improved performance relative to previously reported methods, highlighting the effectiveness of the Swin Transformer and ResMLP combination for brain tumor diagnosis.

Chitra et al. [15] implemented a composite classification system that combines feature extraction from segmented CT images using UNet and AlexNet networks with a KELM classifier, whose hyperparameters are optimized through the meta-heuristic Dragonfly Optimization Algorithm. The model achieves very high performance, suggesting strong discriminative ability between patients with and without chronic obstructive pulmonary disease (COPD). The results indicate that this hybrid approach provides a robust and efficient classifier with real clinical potential for COPD diagnosis and screening. Another composite model was configured by Eliazer et al. [16] for non-invasive detection of neonatal jaundice based on biomedical images. After initial image filtering, a Vision Transformer (ViT) is employed to extract relevant features, while classification is performed using KELM. On a dedicated dataset of neonatal images, the method achieves an accuracy of approximately 96.97%, demonstrating superior performance compared to existing methods. The main conclusion is that the combination of ViT representational power and KELM generalization capability provides a robust and efficient system for non-invasive neonatal jaundice diagnosis, with real potential for clinical application.

In the context of existing research on AI-assisted medical rehabilitation, several white spots remain regarding the effective integration of deep representation learning with fast and highly generalizable classifiers capable of supporting timely clinical decision-making. While numerous studies investigate deep neural networks for feature extraction or Extreme Learning Machine-based models for classification, their unified use within a stacking ensemble framework remains largely unexplored, particularly for applications focused on medical rehabilitation. To address this gap, the present study proposes an original hybrid stacking ensemble that integrates the strong representational capacity of ResMLP with the computational efficiency and generalization capability of KELM. This innovative combination aims to bridge the gap between theoretical predictive models and practical rehabilitation needs, providing a robust and efficient tool for personalized therapy planning and outcome assessment. Furthermore, the proposed approach opens new research directions for intelligent systems in medical rehabilitation and related healthcare applications.

3. Materials and Methods

3.1. Description of the LegUp Robotic System Architecture

The LegUp robotic system, presented in study [2] and described in patented solution [1], is designed for lower-limb rehabilitation. It combines a parallel robot for lower-limb rehabilitation with virtual reality-based therapy, utilizing both IMU BNO055 and EMG SEN0240 sensors. Movements of the healthy limb are captured by the IMU and replicated by the robot on the affected limb, providing “mirror” therapy. EMG signals detect movement intention and muscle fatigue, enabling adaptive, self-regulating control of the exercises. The LegUp robotic system [1], whose general architecture is illustrated in Figure 1, consists of the following main components:

A five-degree-of-freedom parallel robot that acts on the impaired limb to reproduce five movement types detected in the healthy limb: hip flexion/extension, hip adduction/abduction, knee flexion, ankle plantarflexion/dorsiflexion, and ankle eversion/inversion.
A list of three IMU sensors mounted on the healthy leg, which capture movements in real time and transmit the data to the C# application. This application then forwards the data to the Python 3.12.4 module responsible for classifying the type of exercise being performed.
A software application structured into three parts: an intelligent module (implemented in Python), a main module (implemented in C# 12), and a virtual reality interface (implemented in Unity 2022.3.4f1). The main module is used to monitor muscle activity and detect muscle fatigue. The virtual reality (VR) interface provides visual feedback, enhancing motivation and engagement. The intelligent module, designed using a simplified MLP variant, is responsible for classifying the type of exercise to be executed.

By integrating these components, the LegUp system enables the simultaneous execution of movements, providing synchronized, interactive, and adaptive rehabilitation designed to support the recovery of lower-limb motor functions.

3.2. Description of the Existing Intelligent Model

The analyzed rehabilitation system enables real-time transmission of data collected from IMU sensors via WiFi to a software application developed in C#. This application then forwards the data to a Python software, which classifies the exercises using a previously implemented and validated MLP neural network model. The MLP neural network is used to identify and classify the type of exercise performed by the patient with the healthy limb, based on orientation data provided by the IMU sensors.

The MLP neural network consists of an input layer, three hidden layers, and an output layer. The input layer receives nine features extracted from the data generated by the IMU sensors. The first hidden layer contains 16 neurons and uses the ReLU activation function, followed by a second hidden layer with 32 neurons, also using ReLU. The third hidden layer consists of 15 neurons and employs the SoftMax activation function. The output layer performs classification into five classes corresponding to the different types of exercises.

The model was trained for 200 epochs, and its performance was evaluated using two metrics: accuracy, which reached approximately 0.916 during training and 0.929 during validation, and the loss function, with scores of 0.299 for training and 0.264 for validation. These results suggest an efficient, but perfectible, classification model. Therefore, in this study, multiple variants of MLP neural networks will be trained with the aim of obtaining a robust and stable classification model, and the new models will be evaluated using a broader set of metrics to ensure a comprehensive analysis of performance.

3.3. Dataset Structure

The aforementioned system uses three IMU sensors placed on the healthy limb, at the thigh, shank, and foot, each providing three orientation values. The dataset used in this study was originally introduced in our previous work [1]. These data are used to identify the type of exercise performed, among the five possible classes: hip abduction, hip flexion, knee flexion, ankle dorsiflexion, and ankle inversion. For the classification process, a dataset consisting of 1533 records is used, each containing 10 attributes: 9 numerical input features and one output variable indicating the exercise class. The meaning of each input feature, along with four statistical indicators (minimum, maximum, mean, and standard deviation), is presented in Table 1.

The 1533 records of the dataset include an additional piece of information that was not used as an explicit input feature during training or validation, namely the identifier indicating the participant from which each record comes. This information was used exclusively to ensure the correct allocation of records in either the training set or the validation set. The nine signals used as input variables in the classification process have numerical values characterized by these four statistical indicators, reflecting the variation and consistency of the recorded data. The mean values of the signals range from 1.4293 to 19.2262, with standard deviations between 0.0163 (Signal 7) and 0.1088 (Signal 2), indicating relatively low dispersion within the dataset. These characteristics contribute to defining the numerical profile of each exercise, facilitating the machine learning process.

Regarding the output variable, the 1533 records are distributed across five distinct classes as follows: 109 records belong to the Dorsiflexion class, 301 to Hip Abduction, 441 to Hip Flexion, and 204 to Knee Flexion. The distribution of these classes is illustrated in Figure 2. For the classification process, the dataset was divided into two subsets: 75% of the records (1149 instances) were reserved for model training, while the remaining 25% (384 instances) were used for validation. As illustrated in Figure 3, the distribution of the five exercise type classes (Dorsiflexion, Hip Abduction, Hip Flexion, Inversion, and Knee Flexion) across the training and validation subsets reflects a proportional and balanced allocation, with 84/25, 218/83, 346/95, 157/47, and 344/134 instances corresponding to each class in the two subsets, respectively.

3.4. Evaluation Used Metrics

The evaluation of the classification models proposed and implemented in this study is performed using four standard classification metrics: accuracy, macro precision, recall, and F1 score.

In the context of multi-class classification, accuracy is a metric used to quantify the proportion of correct predictions relative to the total number of predictions generated by the model, and is defined by Equation (1):

Accuracy = \frac{\sum_{k = 1}^{5} T P_{k}}{N},

(1)

where

TP_k represents the number of correctly classified examples in class k;
N represents the total number of instances in the dataset.

The evaluation of accuracy in a five-class classification context involves the following steps:

For each instance in the dataset (training or validation), the label predicted by the model is compared with the corresponding true label.
The total number of correct predictions is determined by counting the cases in which the predicted class exactly matches the true class.
The accuracy value is then calculated by dividing the number of correct predictions by the total number of analyzed instances.

In the case of balanced datasets, another significant metric is macro precision. This metric provides an aggregated perspective on the model’s performance by calculating the arithmetic mean of the individual precisions obtained for each class. The calculation formula is defined in Equation (2):

{Precision}_{m a c r o} = \frac{1}{5} \sum_{k = 1}^{5} {Precision}_{k} = \frac{1}{5} \sum_{k = 1}^{5} \frac{T P_{k}}{T P_{k} + F P_{k}},

(2)

where TP_k has the same meaning as in Equation (1), and FP_k represents the number of examples that, belonging to other classes, were incorrectly classified as part of class k.

Recall metric [17] evaluates the model ability to correctly identify all instances belonging to a given class. This metric quantifies the proportion of correctly recognized elements relative to the total number of actual elements of that class. In the context of five-class multi-class classification with balanced datasets, macro recall is used, representing arithmetic mean of the individual class recalls, as defined by the formula in Equation (3):

{Recall}_{m a c r o} = \frac{1}{5} \sum_{k = 1}^{5} {Recall}_{k} = \frac{1}{5} \sum_{k = 1}^{5} \frac{T P_{k}}{T P_{k} + F N_{k}},

(3)

where TP_k has the same meaning as in Equation (1), and FN_k represents the number of examples that belong to class k but were incorrectly classified into another class. Thus, recall provides a relevant measure of the model performance in terms of correctly identifying the instances of each class, being particularly important in contexts where the cost of omissions is high.

F1 score [18] is a composite metric that balances the contribution of precision and recall. In the context of multi-class classification with five classes, this metric is individually calculated for each class and then aggregated at the global level, as defined by the formula in Equation (4):

{F 1 score}_{m a c r o} = \frac{1}{5} \sum_{k = 1}^{5} \frac{2}{\frac{1}{{Precision}_{k}} + \frac{1}{{Recall}_{k}}} = \frac{1}{5} \sum_{k = 1}^{5} \frac{2 \cdot T P_{k}}{2 \cdot T P_{k} + F P_{k} + F N_{k}} .

(4)

3.5. UML Design of Intelligent Classification Software

In order to optimize the MLP model proposed in study [1], a new Python 3.12.10 application is designed in which several basic intelligent methods are analyzed and combined to obtain a new hybrid classification model. The system functionality was described using a UML use case diagram, which provides a structured representation of the main functional components of the application developed using the Python programming language. Employing a UML diagram allows highlighting the actors involved, their interactions with the application, and the operations performed within the intelligent classification process. Shown in Figure 4, this diagram was designed using the StarUML 3.2.2 tool [19]. According to the representation, the classification software involves two actors:

The human user, who controls the entire process of configuring, training, selecting, and evaluating the models.
The software application, developed in C# for robot control, which will use the newly proposed and trained hybrid classification model to make predictions.

This diagram includes 24 use cases, covering all system functionalities, from loading the dataset and selecting MLP or KELM architectures to optimizing parameters, training the models, validating them, generating hybrid ensembles, and saving the evaluation results.

Among the most important use cases are: loading data from a CSV file, splitting the dataset, selecting the type of MLP, tuning parameters using Optuna 4.6.0, training and validating the models, building a hybrid stacking ensemble, and comparing model performances, use cases that specify the functionalities of the application. At the structural level, six association relationships can be identified between the human-user actor and various use cases, reflecting the user direct involvement in the configuration and evaluation stages. In addition, the C# software application actor is associated with a dedicated use case for loading and using the model for prediction.

Furthermore, the diagram contains five dependency relationships, which highlight the logical order of operations. For example, dataset splitting depends on loading the dataset, while model validation depends on the completion of the training process. At the same time, nine extension relationships are illustrated, used to detail specific behaviors within general use cases. Among these are the specializations for the types of MLP, as well as the variations in the new hybrid model proposed in this study.

Overall, the use case diagram provides a complete and structured perspective on how users interact with the intelligent classification software, while also emphasizing the system modular architecture and the interdependencies among its components.

Starting from the specific functionalities of the Python software depicted in the use case diagram, 15 classes were identified, designed, and implemented to meet the proposed specifications. The UML class diagram [20], shown in Figure 5, illustrates the resulting architectural structure, comprising 2 abstract classes and 13 concrete classes, as well as the structural and behavioral relationships among them.

The first central component is the abstract class MLPClassifier, from which five concrete classes are derived, corresponding to the five MLP neural network architectures implemented and analyzed in this study. In the implementation of these five concrete classes, several classes defined in Keras 3.11.0 [21] and TensorFlow 2.19.0 [22] libraries are used, providing the mechanisms required for building, training, and validating neural networks Each of these classes implements buildModel() method in accordance with the structural characteristics of the specific MLP architecture it represents, while ResMLPClassifier class additionally introduces residualBlock() method for defining residual blocks. KELMClassifier class and its subclass KELMNormClassifier are responsible for implementing the kernel-based Extreme Learning Machine classifier. These classes provide methods for kernel computation, training, testing, and evaluation of KELM models, making use of Scikit-learn [23] library.

Another central element of the diagram is represented by the abstract class HybridStackingWeightedClassifier, which defines the general mechanism of the weighted hybrid classifier introduced in this study. From this abstract class, 4 concrete classes are derived, corresponding to the four variants of hybrid stacking ensemble implemented and analyzed. Their implementation relies on models provided by the scikit-learn library: LogisticRegression, RandomForestClassifier, GradientBoostingClassifier and AdaBoostClassifier.

This abstract class is derived from BaseEstimator class defined in sklearn.base, and during the scaling process it uses StandardScaler class from sklearn.preprocessing. The abstract class HybridStackingWeightedClassifier is composed of two objects: one of type ResMLPClassifier and one of type KELMNormClassifier, as indicated by the two composition relationships between these classes.

OptunaOptimizer class, which uses Optuna library, is employed to identify the optimal values for the hyperparameters of the networks defined by MLPClassifier and KELMClassifier classes, as well as their derived classes. The automated optimization thus performed contributes to improving model performance and automating the hyperparameter selection process. Metric class defines four metrics specific to the multiclass classification problem, using sklearn.metrics module, and is used in the evaluation of all analyzed models.

Through inheritance, aggregation, and association relationships, the class diagram highlights the modularity of the software and the clear separation of responsibilities among its components. Moreover, the proposed architecture facilitates system expansion by allowing new classifiers or meta-models to be easily defined, in accordance with the SOLID object-oriented design principles [24].

4. Implementation and Results

In this section, the steps undertaken to implement the new intelligent module for the LegUp robotic system are described, along with the results obtained from experimental testing. The primary objective of this approach was to optimize the existing classifier to improve the accuracy in recognizing exercise types and enhance the overall performance of the system in the context of robot-assisted therapy. The processes of designing, configuring, and training different MLP network architectures, developing the Kernel Extreme Learning Machine classifier, and constructing a hybrid stacking ensemble are presented in detail. This strategy aimed to achieve a coherent integration with the LegUp platform, enabling the intelligent module to provide more robust and precise predictions, thereby directly contributing to the efficiency of the robot-assisted rehabilitation process.

The same dataset was intentionally employed to enable a direct comparison between previously reported model in [1] and the models investigated in this study (MLP neural networks, KELM variants, and the newly proposed hybrid architecture), ensuring that any performance improvements can be attributed to the architectural design rather than to differences in data distribution.

4.1. Analysis of Implemented MLP Neural Networks

Among the various Multilayer Perceptron artificial neural network architectures, the following five variants were selected for implementation: Dense Multilayer Perceptron (DenseMLP), Dropout Multilayer Perceptron (DropoutMLP), Batch Normalization Multilayer Perceptron (BatchNormMLP), Wide Multilayer Perceptron (WideMLP), and Residual Multilayer Perceptron (ResMLP). The hyperparameter optimization process for these five selected MLP architectures was carried out using the Optuna framework [25], which provides an automated and efficient approach for exploring the hyperparameter search space.

4.1.1. Dense Multilayer Perceptron

DenseMLP [26] represents a classical MLP architecture, characterized by full connectivity between neurons, with each neuron in a layer connected to all neurons in the previous layer. This structure facilitates the complete propagation of information across layers. DenseMLP architecture used in this study is defined by an input layer consisting of 9 neurons (corresponding to the nine signals), hidden layers, and an output layer with 5 neurons corresponding to the classes: Dorsiflexion, HipAbduction, HipFlexion, Inversion, and KneeFlexion.

For optimizing the performance of the DenseMLP model, the hyperparameter tuning process was carried out using the Optuna framework, through the systematic exploration of 6 essential hyperparameters that influence both the network architecture and the dynamics of the training process. Each hyperparameter was analyzed within a predefined range of values, according to the specifications presented in Table 2:

The number of hidden layers was adjusted between 6 and 15, with the aim of analyzing the influence of network depth on the model generalization capacity and accuracy.
The size of each hidden layer was chosen from a discrete set of values {8, 12, 16, 20, 24, 28, 32} to evaluate the impact of network capacity on classification performance.
The training batch size was selected from the values 8, 16, 32, and 64, seeking to achieve an optimal compromise between the stability of gradient estimation and computational efficiency.
The learning rate was chosen from a discrete set of values {0.00001, 0.00005, 0.0001, 0.0005, 0.001} to identify the optimal weight update rate during training.
The number of training epochs was incrementally adjusted between 100 and 1000, with a step of 100, to allow identification of the appropriate training duration while avoiding overfitting.
The activation functions implemented in the hidden layers were chosen from a discrete set of four options: Rectified Linear Unit (ReLU), Gaussian Error Linear Unit (GELU), Exponential Linear Unit (ELU), and Hyperbolic Tangent (Tanh), each contributing differently to modeling non-linearities.

By applying the Optuna framework, multiple architectural variants of DenseMLP model were compared in terms of accuracy, leading to the identification of an optimal hyperparameter configuration that ensures superior performance. Regarding the learning rate, lower values favor stable convergence but may require a greater number of epochs to achieve the desired performance. Conversely, higher learning rates accelerate the weight update process, but can lead to instability or overfitting. Optuna identified 0.0005 as the optimal learning rate, providing an efficient balance between convergence speed and training stability. The resulting architecture includes 13 hidden layers, each defined by a specific size and associated activation function, as follows: 28/ReLU, 20/ELU, 32/ELU, 16/GELU, 20/Tanh, 16/GELU, 16/Tanh, 16/GELU, 28/Tanh, 28/GELU, 32/ELU, 24/Tanh, and 16/ReLU. This structure reflects a balanced combination of depth, capacity, and functional diversity, designed to maximize the model generalization ability. Figure 6 illustrates the final neural network architecture, generated based on the optimal hyperparameter values.

Although the dataset size may appear relatively limited for training a deep DenseMLP architecture, several architectural and evaluation strategies were adopted to mitigate the risk of overfitting. The 13-layer DenseMLP configuration was intentionally included to investigate the influence of increased network depth on feature extraction capability. Despite its depth, the model relies on carefully optimized hyperparameters identified through systematic tuning, which help maintain stable learning dynamics and support adequate generalization performance.

DenseMLP model was trained for 700 epochs, using a batch size of 16 and a learning rate of 0.0005, parameters identified as optimal through Optuna. During the training phase, the network achieved an accuracy of 0.91906, a precision of 0.93885, a recall of 0.88830, and a F1 score of 0.90781. In the validation phase, the model performance remained high, with an accuracy of 0.92188, a precision of 0.93245, a recall of 0.90346, and a F1 score of 0.91287, as presented in Table 3. These results reflect a relatively robust and balanced performance of the model in classifying the exercises.

4.1.2. Dropout Multilayer Perceptron

DropoutMLP [27] is a feedforward neural network architecture that incorporates the dropout regularization technique to reduce the risk of overfitting. This technique is applied after each dense layer within MLP network. Dropout mechanism helps improve the generalization ability of neural networks by introducing a form of stochastic regularization, in which a percentage of neurons is randomly deactivated at each training iteration. This random deactivation prevents the formation of excessive dependencies between neurons and limits their co-adaptation, forcing the network to learn more robust, redundant, and distributed representations of the data. During training, each neuron is kept with a probability p, and the rest are temporarily deactivated, which leads to the use of a different sub-model at each iteration. From a theoretical perspective, this process is interpreted as an implicit form of training an ensemble of subnetworks that share the same set of parameters.

Randomly deactivating neurons reduces the sensitivity of the network to noise in the data and to minor variations in the input features, originating from the IMU signals. To optimize the performance of the DropoutMLP model, hyperparameter tuning was performed using the Optuna framework through a systematic exploration of seven key hyperparameters that influence the network architecture. Six of these hyperparameters were also evaluated for the DenseMLP model, while the seventh, the dropout rate, is specific to DropoutMLP. The number of hidden layers was varied between 2 and 10, and for the other five common hyperparameters, the same ranges as for DenseMLP were maintained. The dropout rate was selected from the following discrete set of values: {0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5}, allowing exploration across a wide spectrum, from lower, conservative values that apply minimal regularization to higher, more aggressive values that enforce stronger regularization. This approach provides flexibility in identifying the optimal configuration to maximize model performance.

By using the Optuna framework, the final architecture (Figure 7) of DropoutMLP network includes 3 hidden layers, each characterized by a specific size, an associated activation function, and a dropout rate: 20/Tanh/0.2, 24/Tanh/0.3 si 24/GELU/0.25. DropoutMLP model was trained for 300 epochs using a batch size of 16 and a learning rate of 0.0001, parameters identified as optimal during the tuning process.

During the training phase, the model achieved an accuracy of 0.90601, a precision of 0.92540, a recall of 0.86729, and a F1 score of 0.88504. These values indicate that the model correctly classifies the majority of samples while maintaining a reasonable balance between correctly identifying classes and avoiding false positives. In the validation phase, the model performance was slightly higher in terms of accuracy and precision, with an accuracy of 0.92448, a precision of 0.94314, a recall of 0.86764, and a F1 score of 0.89128, as detailed in Table 3.

These results suggest that DropoutMLP model generalizes well to new data, maintaining high precision and stable recall, making it suitable for robust exercise classification.

4.1.3. Batch Normalization Multilayer Perceptron

BatchNormMLP neural network [28] is a multilayer perceptron where each dense layer is followed by batch normalization and an activation function, in order to stabilize the distribution of activations, accelerate training, and improve classification performance. Thus, the network architecture includes an input layer, an output layer, and a series of hidden blocks, each block consisting of three components: a dense layer, a batch normalization layer, and an activation layer. The role of the batch normalization layer becomes essential for the stable and efficient operation of the network. By normalizing the activations at the level of each mini-batch, this layer adjusts the distribution of each feature channel so that the activations are centered around zero and scaled to a unit variance. This operation reduces unwanted variations in the internal distributions of the activations and limits the phenomenon of internal covariate shift, maintaining a coherent information flow to subsequent layers. Stabilizing the intermediate distributions has direct effects on the optimization process: the gradient propagates more efficiently, the sensitivity to parameter initialization decreases, and the network can use higher learning rates without compromising convergence. In the context of motion classification based on IMU data, this normalization ensures that a consistent scale is maintained for the features associated with orientation and acceleration, preventing uncontrolled amplification or attenuation of certain signal components.

To optimize the performance of the BatchNormMLP model, hyperparameter tuning using Optuna was conducted through a systematic exploration of six key parameters that influence the network architecture and learning process: the number of blocks, the number of nodes in each dense layer, the type of activation function associated with each block, the training batch size, the learning rate, and the number of training epochs. Each hyperparameter was evaluated within a predefined range of values, according to the specifications in Table 2. The optimal values identified by Optuna, also presented in the same table, led to the configuration of a network whose architecture is illustrated in Figure 8. The implemented architecture consists of six hidden blocks, each composed of a Dense–Batch Normalization–Activation sequence, processing an input of nine features and producing a Softmax output across five classes. The model uses heterogeneous activations (ReLU, Tanh, GELU, ELU) and is optimized with Adam, promoting training stability and expressive internal representations.

The model was trained for 400 epochs using the optimal parameters identified: a batch size of 16 and a learning rate of 0.0001. The results demonstrate consistent performance between the training and validation sets, with an accuracy of 0.93473 (training) and 0.94010 (validation), a precision above 0.961 on both sets, a recall of 0.90482 (training) and 0.92033 (validation), and a macro F1 score of 0.92806 and 0.93612, respectively, as shown in Table 3. The model performance on the validation set slightly exceeds that of the training set, with a 0.54% higher accuracy and a significant 1.55% improvement in recall. Precision remains remarkably stable (a difference of only 0.08%), demonstrating consistency in avoiding false positives. F1 score, as an aggregated metric, confirms this positive trend, showing an increase of 0.80%. This pattern of superior performance on unseen data clearly indicates the absence of overfitting and a robust generalization capability.

4.1.4. Wide Multilayer Perceptron

WideMLP neural network [29] represents an architecture in which the hidden layers are extremely wide, containing a very large number of neurons per layer, while the depth, defined as the total number of layers, remains low. Therefore, the representational capacity mainly comes from the width of the layers rather than from the compositional depth.

The large width implies weight matrices of considerable size, leading to significant memory consumption and increased computational complexity for each training step. In contrast to DenseMLP network, which employs numerous hidden layers of moderate size and emphasizes hierarchical and compositional data representation, WideMLP architecture is characterized by a reduced number of very wide layers. This favors diversity in the representations obtained through width, while reducing the level of successive abstraction that is typical of deep architectures.

In the hyperparameter tuning process, the same six hyperparameters used for DenseMLP architecture were employed; however, for two of them, different value ranges were used due to the structural characteristics of WideMLP network. Specifically, the number of hidden layers was explored within the range of 1–5, to reflect the reduced depth characteristic, and the size of each hidden layer was selected from a discrete set of large values typical for wide architectures: {512, 1024, 2048}. Although the dataset size may appear relatively limited for the WideMLP architecture, which includes layers with up to 2048 neurons, this model investigates an alternative design strategy focused on expanding representational capacity. Wide layers allow the network to capture complex nonlinear interactions among IMU-derived orientation features.

The optimal hyperparameter values provided by Optuna and presented in Table 2 highlight the particularities of wide architectures, leading to a WideMLP configuration composed of five hidden layers of large size, each employing a specific activation function, as illustrated in Figure 9. Training this model with a batch size of 32 and a learning rate of 0.001 for 500 epochs produces a classifier whose performance is summarized in Table 3. The WideMLP model demonstrates competitive performance, with accuracy, precision, recall, and F1-score values being comparable across both training and validation sets. The model achieves an accuracy of 0.9208 on the training set and 0.94271 on the validation set, while the consistent values of the associated metrics (precision, recall, and F1-score) indicate a stable and well-calibrated behavior.

4.1.5. Residual MultiLayer Perceptron

ResMLP [30] represents a feed-forward architecture that does not employ attention mechanisms, relying instead on simple residual blocks. The implemented model consists of a sequence of residual blocks, each block applying layer normalization followed by two dense layers separated by a non-linear activation function. Residual connections are maintained through addition operations, and when dimensions do not match, a linear projection is used to align them. ResMLP architecture differs conceptually from the WideMLP approach through its representation learning strategy. WideMLP extends the modeling capacity by increasing the number of neurons in a single layer, allowing for the simultaneous capture of a large number of nonlinear interactions. This approach enlarges the representation space in a single transformation step. In contrast, ResMLP is based on a deep structure consisting of multiple residual blocks, in which representations are progressively refined from one layer to another. Each residual block learns incremental feature transformations, gradually building increasingly abstract representations. Residual connections facilitate efficient information propagation and maintain access to lower-level features, preventing gradient degradation in deep networks.

The optimization of the ResMLP model was carried out with Optuna, through a systematic exploration of seven essential hyperparameters that influence both the architecture and the learning process: the number of blocks, the number of nodes in each block, the activation function, the expansion factor specific to each block, the batch size, the learning rate, and the number of epochs. Six of these parameters were also evaluated for DenseMLP, while the expansion factor is specific to ResMLP.

The variation ranges and the optimal values identified by Optuna are presented in Table 2, and the resulting architecture is illustrated in Figure 10. The final structure includes seven consecutive residual blocks, described by the configuration {16/ReLU/3, 36/GeLU/4, 52/ELU/5, 56/GeLU/5, 48/ELU/4, 52/GeLU/4, 16/ReLU/3}, where each tuple denotes the size of the Dense layer, the activation function, and the expansion factor used.

Each block thus integrates an internal expansion stage, which temporarily increases dimensionality to enhance representational capacity, followed by a projection that prepares the signal for the residual connection.

The model was trained with a batch size of 16, a learning rate of 0.0005, and a total of 600 epochs. The results summarized in Table 3 indicate good generalization performance: F1 score is even slightly higher on the validation set (0.94523) compared to the training set (0.93562), indicating robust adaptation to new data. Both precision and recall, each above 0.928, reflect a balanced classification behavior, without tendencies toward under-prediction or over-prediction.

The proposed ResMLP architecture follows the residual learning principle, in which each block learns a refinement of the input representation rather than an entirely new transformation. Specifically, each residual block combines the original input features with the learned nonlinear mapping, allowing the network to progressively enhance the representation through incremental adjustments. This design facilitates optimization, particularly in deeper configurations, by preserving essential low-level information through identity pathways.

The use of seven consecutive residual blocks increases the representational depth of the network while maintaining training stability. Due to the presence of shortcut connections, gradients can propagate both through the learned transformations and directly through the identity mappings during backpropagation.

This dual propagation mechanism mitigates gradient attenuation across layers and ensures that feature information remains accessible throughout the entire depth of the architecture. Consequently, efficient feature reuse and stable convergence are maintained even with multiple stacked blocks.

Regarding the expansion factor, each residual block temporarily increases the dimensionality of the feature space before projecting it back to the original size. This expansion enables the model to explore higher-dimensional latent representations, capturing complex nonlinear interactions among IMU-derived orientation features and cross-axis correlations. The subsequent compression stage restores dimensional compatibility with the residual pathway while preventing uncontrolled parameter growth. The selected expansion factors, ranging from 3 to 5 across the seven blocks, reflect a balance between representational capacity and computational efficiency. While larger expansion improves expressive power, it also increases parameter count and computational cost. The Optuna-based optimization process identified a configuration that achieves strong generalization performance while maintaining architectural efficiency and controlled complexity.

4.1.6. Performance Comparison of MLP Architectural Variants

The comparative analysis of the five MLP variants, based on Figure 11, highlights clear differences in terms of performance and generalization capability. DenseMLP and WideMLP provide stable results but do not particularly excel in macro-level scores, while DropoutMLP demonstrates good generalization but with a lower recall, suggesting reduced sensitivity to certain classes.

BatchNormMLP stands out with a solid balance between precision, recall, and F1 score, benefiting from the additional stability provided by batch normalization. Remarkably, ResMLP model achieves the best results in both the training and validation phases, consistently outperforming the other models in terms of accuracy, recall, and F1 score. The presence of residual connections appears to facilitate efficient information propagation and reduce performance degradation in deeper networks, giving this model the most robust and stable performance among all the variants analyzed.

4.2. Kernel Extreme Learning Machine

KELM [31] is a powerful machine learning method that combines the simplicity and fast training speed of ELM (Extreme Learning Machine) with the generalization strength of kernel-based approaches, offering a robust and highly competitive algorithm for classification and regression problems. The implemented KELM network is a model with a single hidden layer in which data transformation is implicitly performed through a positive-definite kernel function.

Instead of training internal weights, the model constructs a Gram matrix [32] between all samples and solves an analytical ridge regression equation, which results in very low training time. For multi-class classification, labels are encoded in one-hot format, and prediction is performed by multiplying the kernel between new data and training data with the learned coefficients. The hyperparameters of the KELM model are automatically selected by Optuna to maximize model performance. The kernel type is chosen categorically from three options: Radial Basis Function (RBF), linear and polynomial, defining the data projection space. The regularization parameter

C

is optimized on a logarithmic scale within the range

[10^{- 3}, 10^{3}]

, controlling the balance between training error and model complexity. The parameter

γ

is searched within the range

[10^{- 4}, 10^{1}]

for both RBF and polynomial kernels, influencing the shape of the kernel function. For the polynomial kernel, the polynomial degree (

d \in {2,5}

) and the shift term

(coef 0 \in [0,1])

, are additionally optimized, adjusting the flexibility of the basis function.

Hyperparameter optimization using the Optuna [28] framework led to the identification of an optimal configuration for the KELM model, in which the use of an RBF kernel maximizes classification performance. The experimental results, presented in Table 4, show that, compared to the linear and polynomial kernels, the RBF kernel achieves the highest values across all evaluation metrics, both in the training and validation phases. Thus, the model with RBF kernel reaches an accuracy above 0.955, consistently outperforming the alternatives with superior values of precision, recall, and F1 score, highlighting its ability to capture nonlinear relationships in the data and generalize effectively. For a visual comparison of the metric values for both training and validation, a graphical representation is provided in Figure 12.

These metric results confirm that the flexible structure of RBF kernel is the most suitable for this problem, leading to the optimal overall performance of the KELM model. The efficiency of this kernel highlights the nonlinear nature of the relationships in the data and demonstrates the benefits of radial projection in feature space. The regularization parameter C = 830.74 indicates a low penalty on training errors, favoring an appropriate fit to the data structure, while the value γ = 3.066 suggests a moderate-to-high granularity in separating instances. The RBF function [33] used as the kernel has the following general form:

K (x_{i}, x_{j}) = e^{(- γ {‖x_{i} - x_{j}‖}^{2})},

(5)

where

$x_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i 9}) \in ℝ^{9}$ and $x_{j} = (x_{j 1}, x_{j 2}, \dots, x_{j 9}) \in ℝ^{9}$ denote the feature vectors of the $i$ -th and $j$ -th samples ( $i, j = 1, \dots, N$ ), where $N$ is the number of available data samples;
$‖x_{i} - x_{j}‖ = \sqrt{\sum_{k = 1}^{9} {(x_{i k} - x_{j k})}^{2}}$ denotes the Euclidean distance between the $i$ -th and $j$ -th input feature vectors.
γ is the coefficient that controls the spread of the Gaussian kernel.

Although the RBF kernel implicitly relies on the Euclidean norm, in this study four other norms (Manhattan, Chebyshev, Canberra, and Bray-Curtis) were also investigated, with the aim of evaluating whether they could lead to better results than those obtained using the Euclidean norm.

Manhattan norm [34] between two feature vectors is defined as follows:

‖x_{i} - x_{j}‖ = \sum_{k = 1}^{9} |x_{i k} - x_{j k}|

(6)

Chebyshev norm [35] between two input feature vectors is expressed as follows:

‖x_{i} - x_{j}‖ = \max_{k = 1, 2, \dots, 9} |x_{i k} - x_{j k}|

(7)

Canberra distance [36] between two feature vectors is computed as follows:

D (x_{i}, x_{j}) = \sum_{k = 1}^{9} \frac{|x_{i k} - x_{j k}|}{|x_{i k}| + |x_{j k}|}

(8)

Bray-Curtis distance [37] between two feature vectors is given by the following formula:

D (x_{i}, x_{j}) = \frac{\sum_{k = 1}^{9} |x_{i k} - x_{j k}|}{\sum_{k = 1}^{9} (|x_{i k}| + |x_{j k}|)}

(9)

The performance of the KELM model, which uses RBF kernel and Manhattan norm (KELM-RBF-Manhattan), stands out through its high and well-balanced metric values, as shown in Table 5, confirming the consistency of the results across both datasets. During training, this model achieves an accuracy of 0.95561, a precision of 0.94869, a recall of 0.94789, and an F1 score of 0.94736. On the validation set, the values are even higher: an accuracy of 0.96094, a precision of 0.95508, a recall of 0.96001, and an F1 score of 0.95652. These results confirm a robust generalization capability, as the differences between the training and validation phases are minimal, and the metric values remain consistently within the 0.94–0.96 interval. The balance among precision, recall, and F1 score demonstrates that the model correctly classifies the majority of instances without signs of overfitting. For a visual comparison of the metric values for both training and validation, a graphical representation is provided in Figure 13.

Overall, the combination of the nonlinear RBF kernel with the Manhattan norm and the optimized hyperparameters ensures high stability and maximal discriminative capability of the classifier.

4.3. The Novel Hybrid Stacking Ensemble

To improve the metric values obtained by both the ResMLP model and the KELM-RBF-Manhattan model, a hybrid stacking ensemble model was designed and implemented. Stacking ensemble is an advanced method for combining multiple learning algorithms [38], in which their predictions are used as inputs for an additional model called the meta-learner. The base models (level-0 learners) may belong to completely different families (neural networks, SVMs, decision trees, probabilistic models), which maximizes the diversity of predictions. The meta-learner (level-1 learner) is trained to find an optimal function that maps the concatenated vector of predictions to the true label.

The proposed architecture, illustrated in Figure 14, implements a Hybrid Weighted Stacking Ensemble, in which two models (ResMLP and KELM-RBF-Manhattan) act as level-0 estimators, generating probability distributions that are subsequently used as meta-features by a meta-learner. The meta-learner receives as input the concatenation of the probabilities provided by the base models, each weighted by a coefficient determined through optimization on a validation set. The weight optimization is performed via a grid search that maximizes the F1 score, estimating the optimal linear combination of the predictions from the two models. This mechanism positions this ensemble in an intermediate category between weighted averaging and stacked generalization.

The meta-features are then standardized, ensuring statistical compatibility between models that produce different distributions. For the meta-learner, four standard models are used: Logistic Regression (LR), Random Forest (RF), Gradient Boosting (GB), and AdaBoost (Ada). LR model [39] acts as a linear decision model in the probability space, making it optimal for data with approximately linear relationships between the predictions of the base models. RF [40] and GB [41] models provide the meta-learner with the ability to capture complex nonlinear interactions between the probabilities of the two base models. AdaBoost model [42] uses a weak decision tree as the primary learner and adjusts instance reweighting at the meta-level, enhancing sensitivity to difficult examples. The complete process uses the two pre-trained base models, optimizes the weights, constructs the meta-features, scales the data, and trains the meta-model within a coherent stacking pipeline. The final model is saved as a file containing the trained meta-model, the scaler, and the optimal weights, facilitating later loading and inference without the need to retrain the base models. The four variants of the hybrid ensemble were evaluated using the same four metrics presented earlier, and the results obtained are summarized in Table 6.

The hybrid variant using LR as the meta-learner (denoted ResMLP+KELM+LR) demonstrates consistent performance between the training and test sets, with an accuracy above 0.96, indicating good generalization and a low risk of overfitting. The precision and F1 score of ResMLP+KELM+LR ensamble are very close, suggesting a balance between false positive and false negative rates. Its recall is slightly higher than precision, highlighting that the model correctly identifies the majority of positive examples.

The hybrid variant corresponding to RF meta-model (denoted ResMLP+KELM+RF) achieves a slight improvement over ResMLP+KELM+LR, with an accuracy above 0.965 and an F1 score of 0.95614 on the validation set, indicating better discriminative capability. Its high precision on validation (0.96363), combined with a recall of 0.95365, shows that the model reduces false positive rates without compromising the detection of positive classes. Compared to ResMLP+KELM+LR ensemble, ResMLP+KELM+RF demonstrates a better balance between precision and recall, resulting in a higher F1 score.

The hybrid variant with GB as the meta-model (denoted ResMLP+KELM+GB) is characterized by high precision in training (0.98014) but lower recall (0.92835), suggesting a possible bias toward the negative class or easily classified examples. During validation, this variant maintains high performance, with an F1 score of 0.95503, confirming the model robustness. However, the difference between precision and recall indicates a slight asymmetry in classification.

The hybrid variant using AdaBoost [43] as the meta-model (denoted ResMLP+KELM+Ada) outperforms all other models, achieving exceptional performance on both training and validation sets, with an F1 score above 0.981 and accuracy close to 0.99. This performance suggests a superior ability to capture data complexity and good generalization without significant overfitting.

Overall, all hybrid stacking ensembles exhibit high performance, as illustrated in Figure 15. However, ResMLP+KELM+Ada ensemble demonstrates the best balance between precision and recall. In conclusion, although all hybrid stacking ensembles can be considered effective, ResMLP+KELM+Ada ensemble represents the optimal choice for applications requiring a maximum F1 score and stability between training and test phases.

4.4. Comparative Analysis

To reinforce the conclusion that ResMLP+KELM+Ada hybrid stacking ensemble is significantly superior to the one presented and used in study [1], as well as to the base models ResMLP and KELM-RBF-Manhattan, a comparison was carried out using four metrics: accuracy, precision, recall, and F1 score. Figure 16 shows the performance of these four classification models. For all metrics, the original MLP model exhibits the lowest values, indicating a limited generalization capability. ResMLP classifier clearly outperforms MLP, suggesting that the residual architecture provides significant benefits. KELM-RBF-Manhattan classifier delivers performance comparable to ResMLP, reflecting the efficiency of kernel-based methods.

The ResMLP+KELM+Ada model, representing the novel hybrid stacking ensemble proposed in this study, achieves the best performance across all evaluation metrics, approaching near-optimal classification behavior. Compared to the baseline MLP model, the proposed ensemble demonstrates a substantial improvement in test accuracy, increasing from 0.9297 to 0.9844, which corresponds to an absolute gain of 5.47 percentage points. Similarly, the F1 score improves from 0.9151 to 0.9813, yielding an absolute increase of 6.62 percentage points. In addition, notable gains are observed in precision and recall, with precision increasing from 0.9263 to 0.9812 (+5.49 percentage points) and recall from 0.9205 to 0.9818 (+6.13 percentage points). These consistent improvements across all metrics highlight the superior ability of the ResMLP+KELM+Ada ensemble to capture complex data relationships while effectively reducing overfitting.

Overall, the analysis highlights a systematic improvement in classification performance from the simpler individual models to the newly implemented composite hybrid ensemble. By integrating advanced intelligent techniques—such as residual networks, kernel-based learning, and weighted stacking—the proposed approach significantly enhances the accuracy and robustness of exercise recognition within the LegUp [44] robotic system. These results indicate that ensemble-based strategies [45] are the most effective solution for optimizing the intelligent module of LegUp, directly contributing to more precise and reliable robot-assisted rehabilitation.

4.5. Hybrid Stacking Ensemble Validation

The use of the 5-fold cross-validation procedure [46] aimed at further evaluating the robustness and generalization capacity of the proposed architecture, being implemented for the ResMLP, KELM-RBF Manhattan models and for the hybrid ensemble ResMLP+KELM+Ada. The dataset, consisting of 1533 records, was divided into five mutually exclusive subsets, constructed so that records belonging to the same subject are not distributed in different folds. This subject-independent partitioning strategy prevents potential information leakage between the training and validation sets and ensures a faithful assessment of the models ability to generalize to unseen subjects. At each iteration, one subset was used as the validation set, and the other four constituted the training set, which corresponds to an approximate ratio of 80% for training and 20% for validation in each fold. Repeating the evaluation on multiple data partitions provides a more reliable estimate of performance and reduces the risk of overfitting associated with a single train–validation split.

Table 7 summarizes the performance obtained in the 5-fold cross-validation procedure, with all reported values corresponding specifically to the validation phase. The third column presents the performance obtained using the initial 75/25 split between the training and validation sets, serving as a reference point for interpreting subsequent results. The next five columns, corresponding to the 5 iterations, contain the values obtained in each fold of the cross-validation procedure, reflecting the variation in performance depending on the data partitioning. The penultimate column presents the average of the metrics over the five folds, providing an overall estimate of the model performance under variable partitioning conditions. The last column indicates the standard deviation associated with these results, representing the stability of the model and its sensitivity to changes in the structure of the training and validation sets.

For the ResMLP architecture, the results obtained through the cross-validation procedure highlight a stable behavior from one fold to another. The model achieves an average accuracy of 0.95708 and an average F1 score of 0.94605, values close to those obtained under the initial 75/25 split, which confirms the consistency of its performance. The low standard deviations associated with these metrics indicate that the model performance is not significantly influenced by the specific data partitioning method, suggesting a stable learning process and a reliable generalization capacity.

KELM-RBF-Manhattan model exhibits similar consistency to the ResMLP architecture, achieving an average accuracy of 0.96036 and an average F1 score of 0.95246, values similar to those obtained under the initial 75/25 split, confirming the stability of its performance. The low variability between folds confirms that the kernel-based learning mechanism provides reliable generalization under different data partitioning conditions, indicating a low sensitivity to the structure of the training and validation sets.

The stacking hybrid ensemble was implemented using an out-of-fold prediction strategy to prevent information leakage during meta-learner training. For each fold of the cross-validation procedure, the base models (ResMLP and KELM-RBF-Manhattan) were trained on the training partition, and their predictions were used as inputs for the AdaBoost meta-learner. The ensemble was then evaluated on the corresponding validation fold. The hybrid ResMLP+KELM+Ada ensemble consistently outperforms the individual models. Over the five folds, it achieves an average accuracy of 0.98356 and an average F1 score of 0.98186. The closeness of these values to those obtained in the initial 75/25 configuration confirms that the performance improvement is not dependent on a particular data partitioning.

The superior performance of the ensemble can be attributed to the complementary features of the ResMLP and KELM models, which capture different aspects of the IMU-derived feature space. The stacking strategy combines these heterogeneous predictions via the AdaBoost meta-teacher, helping to reduce the variance of the individual models and strengthen their predictive ability. Thus, the cross-validation analysis confirms that the proposed hybrid architecture offers consistently superior and statistically stable performance. The low values of the standard deviation between folds indicate the robustness of the results and the fact that they are not influenced by a favorable data partition, thus strengthening the confidence in the generalizability of the proposed approach.

4.6. Statistical Significance Analysis of the Hybrid Ensemble Performance

To further support the previous observations and considering the relatively limited size of the dataset, statistical significance tests were conducted to rigorously evaluate the performance differences between the individual models and the proposed hybrid ensemble. First, the Wilcoxon signed-rank test [47] was applied to assess pairwise differences between the ResMLP architecture and the proposed hybrid ensemble, as well as between the KELM-RBF-Manhattan model and the same hybrid ensemble. The metric values used for this analysis [48] were obtained by repeating the 5-fold cross-validation procedure five times, resulting in a sufficiently large set of paired observations for reliable statistical comparison. In both cases (ResMLP vs. ResMLP+KELM+Ada and KELM-RBF-Manhattan vs. ResMLP+KELM+Ada), the Wilcoxon test returned a statistic value of 0 and a p-value of 0.0000000596. This outcome indicates that the hybrid ensemble consistently outperforms the individual models across all evaluated samples, with no observed cases of performance degradation.

In addition, the Friedman test [49] was employed to assess the global differences between the three compared models across the evaluated metrics. The test was performed on the same set of observations generated from the repeated evaluation protocol, namely the results obtained by repeating the 5-fold cross-validation procedure five times. This approach ensures consistency with the Wilcoxon analysis and provides a robust basis for multi-model statistical comparison. The results, summarized in Table 8, indicate extremely high χ² values and correspondingly low p-values, below the standard significance threshold (p < 0.05), for all evaluation metrics. The accuracy metric shows the strongest separation (χ² = 50.00, p = 0.0000000000138), indicating a highly consistent ranking separation across all evaluation instances. Similarly, the precision (χ² = 38.00, p = 0.0000000056), the recall (χ² = 40.88, p = 0.0000000013) and the F1-score (χ² = 48.08, p = 0.000000000036) metrics reflect the stable superiority of the hybrid ensemble. The magnitude of the χ² statistics, combined with p-values several orders of magnitude below the conventional significance threshold, demonstrates that the performance differences between models are systematic and not attributable to random variation.

Taken together, the Wilcoxon and Friedman test results provide strong statistical evidence that the performance improvements achieved by the proposed hybrid architecture are consistent, robust, and reproducible. These findings reinforce the conclusion that the observed gains reflect a genuine advantage of the hybrid stacking approach rather than an artifact of data partitioning or random fluctuations.

4.7. Interpretability, Computational Efficiency, and Clinical Relevance

Although the present study primarily focuses on classification accuracy and model robustness, interpretability represents an important aspect for clinical rehabilitation applications. Future work will incorporate explainable AI techniques, such as SHAP-based feature attribution analysis, to quantify the contribution of individual IMU-derived features to the classification outcome. Such analysis would enable visualization of the model decision-making process, facilitate the identification of biomechanically relevant movement patterns and provide additional biological interpretability to support rehabilitation assessment.

From a computational perspective, the proposed architecture operates on a low-dimensional input space consisting of nine IMU-derived features, which significantly limits the complexity of the forward computations in the fully connected layers. Consequently, the inference stage involves only a small number of matrix operations and kernel evaluations. This lightweight computational structure makes the proposed approach compatible with real-time applications in robotic rehabilitation systems and wearable motion monitoring platforms.

5. Conclusions

This study presented the design, implementation, and evaluation of a hybrid classification approach aimed at improving rehabilitation exercise recognition in the LegUp robotic system. By combining a Residual Multilayer Perceptron with an optimized kernel-based Extreme Learning Machine and integrating their outputs through a weighted stacking strategy, the proposed solution consistently outperformed the baseline MLP classifier and the individual base models across standard evaluation metrics. In particular, the best-performing hybrid configuration (ResMLP+KELM with AdaBoost as meta-learner) improved test accuracy from 0.9297 to 0.98438 and increased macro F1-score from 0.9151 to 0.98133, while also providing balanced gains in precision and recall. To further verify the robustness of these results, a 5-fold cross-validation procedure was conducted for the ResMLP, KELM-RBF-Manhattan, and hybrid ensemble models, and the statistical significance of the observed improvements was confirmed using the Wilcoxon signed-rank test.

The comparative analysis indicates that residual learning, nonlinear kernel projections, and hyperparameter optimization jointly enhance generalization and reduce overfitting, while the weighted stacking mechanism effectively fuses complementary decision patterns from heterogeneous learners. As a result, the proposed ensemble improves the reliability of the intelligent classification module, supporting more accurate identification of rehabilitation exercises and contributing to safer and more consistent robot-assisted therapy execution. Future work will focus on validating the approach under broader real-world conditions (e.g., subject-wise evaluation across multiple sessions, robustness to sensor placement variability, and testing on patient data), and on assessing deployment constraints such as computational footprint for real-time integration.

6. Patent

1. Pisla, D.; Birlescu, I.; Vaida, C.; Tucan, P.; Gherman, B.; Machado, J. Parallel Robot for the Rehabilitation of Lower Limb Joints in Two Planes. Romanian Patent, registration no: A 00116/20.03.2024.

Author Contributions

Conceptualization, A.-E.I. and C.V.; methodology, A.-E.I. and F.C.; software, A.-E.I.; validation, B.G. and F.C.; formal analysis, D.P. and C.V.; investigation, I.N. and F.C.; resources, P.T. and B.G.; data curation, A.B. and I.U.; writing—original draft preparation, A.-E.I.; writing—review and editing, D.P. and C.V.; visualization, J.M. and F.C.; supervision, D.P.; project administration, C.V.; funding acquisition, C.V. and D.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been supported by the project “New frontiers in adaptive modular robotics for patient-centered medical rehabilitation—ASKLEPIOS”, funded by European Union—NextGenerationEU and Romanian Government, under National Recovery and Resilience Plan for Romania, contract no. 760071/23.05.2023, code CF 121/15.11.2022, with the Romanian Ministry of Research, Innovation and Digitalization, within Component 9, investment I8.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki involving healthy subjects. The protocol was approved by the Ethics Committee of contract no. 760071/23.05.2023 and 334906 on 12 February 2025.

Informed Consent Statement

Written informed consent has been obtained from the healthy subjects to publish this paper.

Data Availability Statement

The data presented in this study are included in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Covaciu, F.; Gherman, B.; Vaida, C.; Pisla, A.; Tucan, P.; Caprariu, A.; Pisla, D. A Combined Mirror–EMG Robot-Assisted Therapy System for Lower Limb Rehabilitation. Technologies 2025, 13, 227. [Google Scholar]
Birlescu, I.; Tohanean, N.; Vaida, C.; Gherman, B.; Neguran, D.; Horsia, A.; Tucan, P.; Condurache, D.; Pisla, D. Modeling and analysis of a parallel robotic system for lower limb rehabilitation with predefined operational workspace. Mech. Mach. Theory 2024, 198, 105674. [Google Scholar] [CrossRef]
Mienye, I.D.; Sun, Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
Korial, A.E.; Gorial, I.I.; Humaidi, A.J. An Improved Ensemble-Based Cardiovascular Disease Detection System with Chi-Square Feature Selection. Computers 2024, 13, 126. [Google Scholar] [CrossRef]
Majumder, A.B.; Gupta, S.; Singh, D.; Acharya, B.; Gerogiannis, V.C.; Kanavos, A.; Pintelas, P. Heart Disease Prediction Using Concatenated Hybrid Ensemble Classifiers. Algorithms 2023, 16, 538. [Google Scholar] [CrossRef]
Motamedi, B.; Villányi, B. A Predictive Analytics Approach with Bayesian-Optimized Gentle Boosting Ensemble Models for Diabetes Diagnosis. Comput. Methods Programs Biomed. Update 2025, 7, 100184. [Google Scholar]
Arya, N.; Saha, S. Multi-Modal Classification for Human Breast Cancer Prognosis Prediction: Proposal of Deep-Learning Based Stacked Ensemble Model. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 19, 1032–1041. [Google Scholar]
Gunasekaran, H.; Gladys, A.; Kanmani, D.; Macedo, R.; Blessing, W. Brain Stroke Prediction Using Stacked Ensemble Model. J. Kejuruter. 2024, 36, 1759–1768. [Google Scholar] [CrossRef] [PubMed]
Hossain, M.M.; Ahmed, M.M.; Hasan Rakib, M.; Zia, M.O.; Hasan, R.; Islam, M.R.; Islam, M.S.; Alam, M.S.; Islam, M.K. Optimizing Stroke Risk Prediction: A Primary Dataset-Driven Ensemble Classifier with Explainable Artificial Intelligence. Health Sci. Rep. 2025, 8, 70799. [Google Scholar]
Obaido, G.; Achilonu, O.; Ogbuokiri, B.; Amadi, C.S.; Habeebullahi, L.; Ohalloran, T.; Chukwu, C.W.; Mienye, E.; Aliyu, M.; Fasawe, O.; et al. An Improved Framework for Detecting Thyroid Disease Using Filter-Based Feature Selection and Stacking Ensemble. IEEE Access 2024, 12, 89098–89112. [Google Scholar] [CrossRef]
Shakhovska, N.; Yakovyna, V.; Chopyak, V. A New Hybrid Ensemble Machine-Learning Model for Severity Risk Assessment and Post-COVID Prediction System. Math. Biosci. Eng. 2022, 19, 6102–6123. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Zhang, J.; Yuan, J.; Li, Q.; Zhang, S.; Wang, C.; Wang, H.; Wang, L.; Zhang, B.; Wang, C.; et al. Application of a Novel Nested Ensemble Algorithm in Predicting Motor Function Recovery in Patients with Traumatic Cervical Spinal Cord Injury. Sci. Rep. 2024, 14, 17403. [Google Scholar] [CrossRef]
Aonpong, P.; Ai, Y.; Wang, W.; Li, Y.; Iwamoto, Y.; Han, X.; Chen, Y.W. Residual Multilayer Perceptrons for Genotype-Guided Recurrence Prediction of Non-Small Cell Lung Cancer. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Glasgow, UK, 11–15 July 2022; pp. 447–450. [Google Scholar]
Pacal, I. A novel Swin transformer approach utilizing residual multi-layer perceptron for diagnosing brain tumors in MRI images. Int. J. Mach. Learn. Cybern. 2024, 9, 3579–3597. [Google Scholar] [CrossRef]
Chitra, S.; Alqahtani, T.M.; Alduraywish, A.; Sikkandar, M.Y. A robust chronic obstructive pulmonary disease classification model using dragonfly optimized kernel extreme learning machine. Sci. Rep. 2025, 15, 18702. [Google Scholar] [CrossRef]
Eliazer, M.; Amaran, S.; Sreekumar, K.; Vikram, A.; Joshi, G.P.; Cho, W. Integrating vision transformer-based deep learning model with kernel extreme learning machine for non-invasive diagnosis of neonatal jaundice using biomedical images. Sci. Rep. 2025, 15, 25493. [Google Scholar] [CrossRef]
Panoiu, M.; Ivascanu, P.; Panoiu, C. Analysis of Operating Regimes and THD Forecasting in Steelmaking Plant Power Systems using Advanced Neural Architectures. Mathematics 2025, 13, 3692. [Google Scholar] [CrossRef]
Hinojosa, C.; Braet, J.; Springael, J. Performance Metrics for Multilabel Emotion Classification: Comparing Micro, Macro, and Weighted F1-Scores. Appl. Sci. 2024, 14, 9863. [Google Scholar] [CrossRef]
Almasabe, A.; Ludi, S.; Elfaki, A.O. Scrutinizing UML Teaching and Learning Modeling Tools. Trans. Eng. Comput. Sci. 2023, 11, 1–21. [Google Scholar]
Tazin, A.; Kokar, M. UML Class Diagram Classification Using Category Theory. J. Softw. Eng. Appl. 2025, 18, 217–248. [Google Scholar] [CrossRef]
Sindhu, D.G.; Manjunatha, G.C. Handwritten Digit Recognition Using Multilayer Neural Networks with Keras and Tensorflow. Int. J. Res. Appl. Sci. Eng. Technol. 2025, 13, 948–955. [Google Scholar] [CrossRef]
Damasco, L.B.; Decripito, J.J. Generative Syntactic Analysis Using TensorFlow. Int. J. Eng. Comput. Sci. 2025, 14, 27655–27661. [Google Scholar] [CrossRef]
Devasahayam, S. Advancing Flotation Process Modeling: Bayesian vs. Sklearn Approaches for Gold Grade Prediction. Minerals 2025, 15, 591. [Google Scholar] [CrossRef]
Yanakiev, I.; Lazar, B.M.; Capiluppi, A. Applying SOLID Principles for the Refactoring of Legacy Code: An Experience Report. J. Syst. Softw. 2025, 220, 112254. [Google Scholar]
Sholihin, I.; Sunyoto, A. Improving Tomato Ripeness Classification Using Knowledge Distillation and Hyperparameter Optimization with Optuna. J. Electr. Eng. Comput. 2025, 7, 282–290. [Google Scholar]
Panoiu, M.; Panoiu, C.; Mezinescu, S.; Militaru, G.; Baciu, I. Machines Learning Techniques Applied to the Harmonic Analysis of Railway Power Supply. Mathematics 2023, 11, 1381. [Google Scholar] [CrossRef]
Hinton, G.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Rachmad, Y.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Jose, J.M. A Lightweight Deep Learning Framework using Resource-Efficient Batch Normalization for Sarcasm Detection. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2025, 11, 36–56. [Google Scholar] [CrossRef]
Lu, Z.; Pu, H.; Wang, F.; Hu, Z.; Wang, L. The Expressive Power of Neural Networks: A View from the Width. Adv. Neural Inf. Process. Syst. 2017, 30, 6231–6239. [Google Scholar]
Touvron, H.; Bojanowski, P.; Caron, M.; Cord, M.; El-Nouby, A.; Grave, E.; Izacard, G.; Joulin, A.; Synnaeve, G.; Verbeek, J.; et al. ResMLP: Feedforward Networks for Image Classification with Data-Efficient Training. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 5314–5321. [Google Scholar] [CrossRef]
Yu, S. An Efficient Kernel Extreme Learning Machine Approach for Bankruptcy Prediction. J. Comput. Sci. Artif. Intell. 2025, 2, 72–77. [Google Scholar] [CrossRef]
Iosifidis, A.; Tefas, A.; Pitas, I. On the Kernel Extreme Learning Machine Classifier. Pattern Recognit. Lett. 2015, 65, 66–72. [Google Scholar]
Singh, H. Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory. Mathematics 2024, 12, 829. [Google Scholar] [CrossRef]
Ermiş, T.; Şen, A.O.; Gielis, J. A New Approach to Circular Inversion in $l_{1}$ -Normed Spaces. Symmetry 2024, 16, 874. [Google Scholar] [CrossRef]
Liu, X.; Liu, Q.; Liu, S.; Yan, G.; Zhang, F.; Zeng, C.; Yang, X. A Dual-Norm Support Vector Machine: Integrating L1 and L∞ Slack Penalties for Robust and Sparse Classification. Processes 2025, 13, 2858. [Google Scholar] [CrossRef]
Pulungan, A.F.; Zarlis, M.; Suwilo, S. Analysis of Braycurtis, Canberra and Euclidean Distance in KNN Algorithm. SinkrOn 2019, 4, 74–77. [Google Scholar] [CrossRef]
Baranyi, J.; Csorba, S.; Farkas, Z.; Pacza, T.; Jóźwiak, Á. Internal dynamics of patent reference networks using the Bray–Curtis dissimilarity measure. J. Big Data 2024, 11, 27. [Google Scholar] [CrossRef]
Muscalagiu, I.; Popa, H.E.; Negru, V. Improving the Performances of Asynchronous Search Algorithms in Scale-Free Networks using the Nogoood Processor Technique. Comput. Inform. 2015, 34, 254–274. [Google Scholar]
Puppala, A. A Comprehensive Review of Linear Regression Models: Theory, Applications, and Advanced Techniques. Int. J. Mach. Learn. Cybern. 2025, 3, 1–6. [Google Scholar]
Santos, L.F.F.M.; Sánchez-Tena, M.Á.; Alvarez-Peregrina, C.; Martinez-Perez, C. Artificial Intelligence-Driven Diagnostics in Eye Care: A Random Forest Approach for Data Classification and Predictive Modeling. Algorithms 2025, 18, 647. [Google Scholar] [CrossRef]
Kyrychek, M. Gradient boosting as a tool for solving classification problems in data-constrained environments. Technol. Eng. 2025, 26, 37–47. [Google Scholar] [CrossRef]
Nasution, M.; Munthe, I.R.; Nasution, F.A.; Defit, S. Optimizing Text Classification Using Techniques AdaBoost Ensemble with Decision Tree Algorithm. CogITo Smart J. 2025, 11, 39–51. [Google Scholar] [CrossRef]
Serviana, D.; Pribadi, O.; Robet, R. Ensemble of Random Forest and Adaboost Algorithms for Human Skin Disease Classification. J. Artif. Intell. Eng. Appl. 2025, 5, 109–115. [Google Scholar] [CrossRef]
Tohanean, N.; Tucan, P.; Vanta, O.-M.; Abrudan, C.; Pintea, S.; Gherman, B.; Burz, A.; Banica, A.; Vaida, C.; Neguran, D.A.; et al. The Efficacity of the NeuroAssist Robotic System for Motor Rehabilitation of the Upper Limb—Promising Results from a Pilot Study. J. Clin. Med. 2023, 12, 425. [Google Scholar] [CrossRef]
Nandan, N. Hybrid Ensemble and Stacking Approach for Classification of Parkinson’s Disease. Int. J. Appl. Math. 2025, 38, 1588–1602. [Google Scholar]
Manza, Y.; Rosnelly, R.; Furqan, M.; Reza, B. Optimized KNN Performance with PCA and K-Fold Cross-Validation for Colorectal Cancer Survival Prediction. J. Tek. Inform. 2026, 7, 361–372. [Google Scholar] [CrossRef]
Chen, X.; Diaz, F. On the Asymptotic Distribution of the Wilcoxon Signed Rank Test Statistic. J. Probab. Stat. Sci. 2026, 24, 230–235. [Google Scholar] [CrossRef]
Adegbaye, A.L.; Adeoti, O.A. Neutrosophic Wilcoxon Signed-Rank Test with Application to Diabetes Data. J. Med. Dent. Sci. Res. 2024, 11, 17–23. [Google Scholar]
Inyang, E.; Moffat, I.; Clement, E. Friedman Test Technique for Optimizing a Seasonal Box–Jenkins ARIMA Model Building. J. Probab. Stat. Sci. 2024, 22, 18–39. [Google Scholar]

Figure 1. General architecture of the LegUp system.

Figure 2. Distribution of exercise type classes.

Figure 3. Distribution of exercise type classes in training and validation sets.

Figure 4. UML use case diagram.

Figure 5. UML class diagram.

Figure 6. DenseMLP architecture.

Figure 7. DropoutMLP architecture.

Figure 8. BatchNormMLP architecture.

Figure 9. WideMLP architecture.

Figure 10. ResMLP architecture.

Figure 11. MLP metrics.

Figure 12. KELM metrics.

Figure 13. Metrics for KELM with RBF kernel.

Figure 14. Novel hybrid stacking ensamble architecture.

Figure 15. Metrics for Hybrid Stacking Ensembles.

Figure 16. Metrics for the compared models.

Table 1. Description of the input data.

Signal	Signal Semnification		Minimum	Maximum	Mean	Standard Deviation
Signal 1	Position of the first sensor IMU	X-coordinate	13.0772	13.3605	13.1956	0.0452
Signal 2		Y-coordinate	1.3508	1.7486	1.4982	0.1088
Signal 3		Z-coordinate	18.6771	18.8718	18.7727	0.0346
Signal 4	Position of the second sensor IMU	X-coordinate	13.1073	13.2689	13.1463	0.0297
Signal 5		Y-coordinate	1.3299	1.6129	1.4293	0.0839
Signal 6		Z-coordinate	18.9079	19.0007	18.9395	0.0214
Signal 7	Position of the third sensor IMU	X-coordinate	13.0252	13.1045	13.0400	0.0163
Signal 8		Y-coordinate	1.3774	1.5085	1.4357	0.0480
Signal 9		Z-coordinate	19.1981	19.2658	19.2262	0.0214

Table 2. Optimized hyperparameters.

Hyperparameter	Search Domain	Model	Used Data
Epochs	{100, 200, 300, 400, 500, 600, 700, 800, 900, 1000}	DenseMLP	700
		DropoutMLP	300
		BatchNormMLP	400
		WideMLP	500
		ResMLP	600
Learning rate	{0.00001, 0.00005, 0.0001, 0.0005, 0.001}	DenseMLP	0.0005
		ResMLP	0.0005
		DropoutMLP	0.0001
		BatchNormMLP	0.0001
		WideMLP	0.001
Batch size	{8, 16, 32, 64}	DenseMLP	16
		DropoutMLP
		BatchNormMLP
		ResMLP
		WideMLP	32
Number of hidden layers/blocks	{6, 7, 8, 9, 10, 11, 12, 13, 14, 15}	DenseMLP	13
	{6, 7, 8, 9, 10, 11, 12, 13, 14, 15}	BatchNormMLP	6
	{2, 3, 4, 5, 6, 7, 8, 9, 10}	DropoutMLP	3
	{2, 3, 4, 5, 6, 7, 8, 9, 10}	ResMLP	7
	{1, 2, 3, 4, 5}	WideMLP	2
Hidden layer size	{8, 12, 16, 20, 24, 28, 32}	DenseMLP	28/20/32/16/20/16/16/ 16/28/28/32/24/16
		DropoutMLP	20/24/24
		BatchNormMLP	16/24/20/28/32/16
	{256, 512, 1024, 2048, 4096}	WideMLP	2048/1024
	{16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64}	ResMLP	16/36/52/56/48/52/16
Activation function	{ReLU, GELU, ELU, Tanh}	DenseMLP	ReLU/ELU/ELU/GELU/Tanh/GELU/Tanh/GELU/Tanh/GELU/ELU/Tanh/ReLU
		DropoutMLP	Tanh/Tanh/GELU
		BatchNormMLP	ReLU/Tanh/GELU/ReLU/ELU/GELU
		WideMLP	ReLU/GELU
		ResMLP	ReLU/GELU/ELU/GeLU/ELU/GeLU/ReLU
Dropout rate	{0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5}	DropoutMLP	0.2/0.3/0.25
Expansion factor	{2, 3, 4, 5, 6}	ResMLP	3/4/5/5/4/4/3

Table 3. Metric values for MLP architectures.

Model	Phase	Accuracy	Precision	Recall	F1 Score
DenseMLP	Train	0.91906	0.93885	0.88830	0.90781
DenseMLP	Validation	0.92188	0.93245	0.90346	0.91287
DropoutMLP	Train	0.90601	0.92540	0.86729	0.88504
DropoutMLP	Validation	0.92448	0.94314	0.86764	0.89128
BatchNormMLP	Train	0.93473	0.96178	0.90482	0.92806
BatchNormMLP	Validation	0.94010	0.96102	0.92033	0.93612
WideMLP	Train	0.92080	0.92497	0.90805	0.90901
WideMLP	Validation	0.94271	0.93902	0.93441	0.93189
ResMLP	Train	0.94691	0.92818	0.95113	0.93562
ResMLP	Validation	0.95833	0.93749	0.96016	0.94523

Table 4. Metric values for KELM architectures.

Kernel Type	Phase	Accuracy	Precision	Recall	F1 Score
Linear	Train	0.86249	0.79217	0.79480	0.78846
Linear	Validation	0.88021	0.78718	0.80026	0.78954
Polynomial	Train	0.93212	0.93189	0.91561	0.91658
Polynomial	Validation	0.94792	0.94267	0.93739	0.93543
RBF	Train	0.94778	0.93650	0.92829	0.93123
RBF	Validation	0.95573	0.94890	0.94960	0.94787

Table 5. Metric values for KELM with RBF kernel.

Norm	Phase	Accuracy	Precision	Recall	F1 Score
Euclid	Train	0.94778	0.93650	0.92829	0.93127
Euclid	Validation	0.95573	0.94890	0.94960	0.94787
Manhattan	Train	0.95561	0.94859	0.94789	0.94736
Manhattan	Validation	0.96094	0.95508	0.96001	0.95652
Chebyshev	Train	0.94517	0.95187	0.92462	0.93390
Chebyshev	Validation	0.94792	0.95147	0.93800	0.94156
Canberra	Train	0.94952	0.93658	0.94744	0.93747
Canberra	Validation	0.95052	0.93382	0.95233	0.93867
Bray-Curtis	Train	0.94865	0.95403	0.92660	0.93575
Bray-Curtis	Validation	0.95313	0.95493	0.94160	0.94507

Table 6. Metric values for hybrid stacking ensembles.

Stacking Ensemble	Phase	Accuracy	Precision	Recall	F1 Score
ResMLP+KELM+LR	Train	0.96087	0.94692	0.95970	0.95018
ResMLP+KELM+LR	Validation	0.96354	0.94253	0.96437	0.94973
ResMLP+KELM+RF	Train	0.96522	0.96561	0.94671	0.95283
ResMLP+KELM+RF	Validation	0.96615	0.96363	0.95365	0.95614
ResMLP+KELM+GB	Train	0.96739	0.98014	0.92835	0.94699
ResMLP+KELM+GB	Validation	0.96354	0.97413	0.94105	0.95503
ResMLP+KELM+Ada	Train	0.99022	0.99270	0.99152	0.99204
ResMLP+KELM+Ada	Validation	0.98438	0.98117	0.98178	0.98133

Table 7. Metric values after cross-validation for validation.

Model	Metric	Initial Split	Cross Validation
Model	Metric	Initial Split	Iteration 1	Iteration 2	Iteration 3	Iteration 4	Iteration 5	Average	SD
ResMLP	Accuracy	0.95833	0.95891	0.95752	0.95695	0.95765	0.95439	0.95708	0.00167
	Precision	0.93749	0.95671	0.94117	0.93552	0.95326	0.94302	0.94594	0.00880
	Recall	0.96016	0.94681	0.95964	0.94276	0.94655	0.95508	0.95017	0.00695
	F1-score	0.94523	0.94867	0.94781	0.93873	0.94641	0.94861	0.94605	0.00419
KELM	Accuracy	0.96094	0.96078	0.96091	0.96093	0.96086	0.95833	0.96036	0.00114
	Precision	0.95508	0.96243	0.94691	0.95836	0.96165	0.93731	0.95333	0.01090
	Recall	0.96001	0.96321	0.95461	0.95061	0.95221	0.95931	0.95599	0.00520
	F1-score	0.95652	0.96132	0.95004	0.95174	0.95525	0.94397	0.95246	0.00642
Hybrid ensemble	Accuracy	0.98438	0.98239	0.98043	0.98693	0.98371	0.98433	0.98356	0.00241
	Precision	0.98117	0.97492	0.97876	0.98430	0.98636	0.97789	0.98045	0.00474
	Recall	0.98178	0.98084	0.98271	0.98259	0.98905	0.98280	0.98360	0.00315
	F1-score	0.98133	0.97764	0.98067	0.98304	0.98767	0.98026	0.98186	0.00377

Table 8. Friedman test results.

Metric	χ²	p
Accuracy	50.00	0.0000000000138
Precision	38.00	0.0000000056
Recall	40.88	0.0000000013
F1-score	48.08	0.000000000036

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Iordan, A.-E.; Covaciu, F.; Vaida, C.; Nadas, I.; Banica, A.; Gherman, B.; Ulinici, I.; Machado, J.; Tucan, P.; Pisla, D. A Novel Hybrid Stacking Ensemble Classifier for the LegUp Robot Used in Lower Limb Rehabilitation. AI 2026, 7, 177. https://doi.org/10.3390/ai7050177

AMA Style

Iordan A-E, Covaciu F, Vaida C, Nadas I, Banica A, Gherman B, Ulinici I, Machado J, Tucan P, Pisla D. A Novel Hybrid Stacking Ensemble Classifier for the LegUp Robot Used in Lower Limb Rehabilitation. AI. 2026; 7(5):177. https://doi.org/10.3390/ai7050177

Chicago/Turabian Style

Iordan, Anca-Elena, Florin Covaciu, Calin Vaida, Iuliu Nadas, Alexandru Banica, Bogdan Gherman, Ionut Ulinici, Jose Machado, Paul Tucan, and Doina Pisla. 2026. "A Novel Hybrid Stacking Ensemble Classifier for the LegUp Robot Used in Lower Limb Rehabilitation" AI 7, no. 5: 177. https://doi.org/10.3390/ai7050177

APA Style

Iordan, A.-E., Covaciu, F., Vaida, C., Nadas, I., Banica, A., Gherman, B., Ulinici, I., Machado, J., Tucan, P., & Pisla, D. (2026). A Novel Hybrid Stacking Ensemble Classifier for the LegUp Robot Used in Lower Limb Rehabilitation. AI, 7(5), 177. https://doi.org/10.3390/ai7050177

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Novel Hybrid Stacking Ensemble Classifier for the LegUp Robot Used in Lower Limb Rehabilitation

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Description of the LegUp Robotic System Architecture

3.2. Description of the Existing Intelligent Model

3.3. Dataset Structure

3.4. Evaluation Used Metrics

3.5. UML Design of Intelligent Classification Software

4. Implementation and Results

4.1. Analysis of Implemented MLP Neural Networks

4.1.1. Dense Multilayer Perceptron

4.1.2. Dropout Multilayer Perceptron

4.1.3. Batch Normalization Multilayer Perceptron

4.1.4. Wide Multilayer Perceptron

4.1.5. Residual MultiLayer Perceptron

4.1.6. Performance Comparison of MLP Architectural Variants

4.2. Kernel Extreme Learning Machine

4.3. The Novel Hybrid Stacking Ensemble

4.4. Comparative Analysis

4.5. Hybrid Stacking Ensemble Validation

4.6. Statistical Significance Analysis of the Hybrid Ensemble Performance

4.7. Interpretability, Computational Efficiency, and Clinical Relevance

5. Conclusions

6. Patent

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI