3.2. Description of the Existing Intelligent Model
The analyzed rehabilitation system enables real-time transmission of data collected from IMU sensors via WiFi to a software application developed in C#. This application then forwards the data to a Python software, which classifies the exercises using a previously implemented and validated MLP neural network model. The MLP neural network is used to identify and classify the type of exercise performed by the patient with the healthy limb, based on orientation data provided by the IMU sensors.
The MLP neural network consists of an input layer, three hidden layers, and an output layer. The input layer receives nine features extracted from the data generated by the IMU sensors. The first hidden layer contains 16 neurons and uses the ReLU activation function, followed by a second hidden layer with 32 neurons, also using ReLU. The third hidden layer consists of 15 neurons and employs the SoftMax activation function. The output layer performs classification into five classes corresponding to the different types of exercises.
The model was trained for 200 epochs, and its performance was evaluated using two metrics: accuracy, which reached approximately 0.916 during training and 0.929 during validation, and the loss function, with scores of 0.299 for training and 0.264 for validation. These results suggest an efficient, but perfectible, classification model. Therefore, in this study, multiple variants of MLP neural networks will be trained with the aim of obtaining a robust and stable classification model, and the new models will be evaluated using a broader set of metrics to ensure a comprehensive analysis of performance.
3.3. Dataset Structure
The aforementioned system uses three IMU sensors placed on the healthy limb, at the thigh, shank, and foot, each providing three orientation values. The dataset used in this study was originally introduced in our previous work [
1]. These data are used to identify the type of exercise performed, among the five possible classes: hip abduction, hip flexion, knee flexion, ankle dorsiflexion, and ankle inversion. For the classification process, a dataset consisting of 1533 records is used, each containing 10 attributes: 9 numerical input features and one output variable indicating the exercise class. The meaning of each input feature, along with four statistical indicators (minimum, maximum, mean, and standard deviation), is presented in
Table 1.
The 1533 records of the dataset include an additional piece of information that was not used as an explicit input feature during training or validation, namely the identifier indicating the participant from which each record comes. This information was used exclusively to ensure the correct allocation of records in either the training set or the validation set. The nine signals used as input variables in the classification process have numerical values characterized by these four statistical indicators, reflecting the variation and consistency of the recorded data. The mean values of the signals range from 1.4293 to 19.2262, with standard deviations between 0.0163 (Signal 7) and 0.1088 (Signal 2), indicating relatively low dispersion within the dataset. These characteristics contribute to defining the numerical profile of each exercise, facilitating the machine learning process.
Regarding the output variable, the 1533 records are distributed across five distinct classes as follows: 109 records belong to the Dorsiflexion class, 301 to Hip Abduction, 441 to Hip Flexion, and 204 to Knee Flexion. The distribution of these classes is illustrated in
Figure 2. For the classification process, the dataset was divided into two subsets: 75% of the records (1149 instances) were reserved for model training, while the remaining 25% (384 instances) were used for validation. As illustrated in
Figure 3, the distribution of the five exercise type classes (Dorsiflexion, Hip Abduction, Hip Flexion, Inversion, and Knee Flexion) across the training and validation subsets reflects a proportional and balanced allocation, with 84/25, 218/83, 346/95, 157/47, and 344/134 instances corresponding to each class in the two subsets, respectively.
3.4. Evaluation Used Metrics
The evaluation of the classification models proposed and implemented in this study is performed using four standard classification metrics: accuracy, macro precision, recall, and F1 score.
In the context of multi-class classification, accuracy is a metric used to quantify the proportion of correct predictions relative to the total number of predictions generated by the model, and is defined by Equation (1):
where
The evaluation of accuracy in a five-class classification context involves the following steps:
For each instance in the dataset (training or validation), the label predicted by the model is compared with the corresponding true label.
The total number of correct predictions is determined by counting the cases in which the predicted class exactly matches the true class.
The accuracy value is then calculated by dividing the number of correct predictions by the total number of analyzed instances.
In the case of balanced datasets, another significant metric is macro precision. This metric provides an aggregated perspective on the model’s performance by calculating the arithmetic mean of the individual precisions obtained for each class. The calculation formula is defined in Equation (2):
where
TPk has the same meaning as in Equation (1), and
FPk represents the number of examples that, belonging to other classes, were incorrectly classified as part of class
k.
Recall metric [
17] evaluates the model ability to correctly identify all instances belonging to a given class. This metric quantifies the proportion of correctly recognized elements relative to the total number of actual elements of that class. In the context of five-class multi-class classification with balanced datasets, macro recall is used, representing arithmetic mean of the individual class recalls, as defined by the formula in Equation (3):
where
TPk has the same meaning as in Equation (1), and
FNk represents the number of examples that belong to class
k but were incorrectly classified into another class. Thus, recall provides a relevant measure of the model performance in terms of correctly identifying the instances of each class, being particularly important in contexts where the cost of omissions is high.
F1 score [
18] is a composite metric that balances the contribution of precision and recall. In the context of multi-class classification with five classes, this metric is individually calculated for each class and then aggregated at the global level, as defined by the formula in Equation (4):
3.5. UML Design of Intelligent Classification Software
In order to optimize the MLP model proposed in study [
1], a new Python 3.12.10 application is designed in which several basic intelligent methods are analyzed and combined to obtain a new hybrid classification model. The system functionality was described using a UML use case diagram, which provides a structured representation of the main functional components of the application developed using the Python programming language. Employing a UML diagram allows highlighting the actors involved, their interactions with the application, and the operations performed within the intelligent classification process. Shown in
Figure 4, this diagram was designed using the StarUML 3.2.2 tool [
19]. According to the representation, the classification software involves two actors:
The human user, who controls the entire process of configuring, training, selecting, and evaluating the models.
The software application, developed in C# for robot control, which will use the newly proposed and trained hybrid classification model to make predictions.
This diagram includes 24 use cases, covering all system functionalities, from loading the dataset and selecting MLP or KELM architectures to optimizing parameters, training the models, validating them, generating hybrid ensembles, and saving the evaluation results.
Among the most important use cases are: loading data from a CSV file, splitting the dataset, selecting the type of MLP, tuning parameters using Optuna 4.6.0, training and validating the models, building a hybrid stacking ensemble, and comparing model performances, use cases that specify the functionalities of the application. At the structural level, six association relationships can be identified between the human-user actor and various use cases, reflecting the user direct involvement in the configuration and evaluation stages. In addition, the C# software application actor is associated with a dedicated use case for loading and using the model for prediction.
Furthermore, the diagram contains five dependency relationships, which highlight the logical order of operations. For example, dataset splitting depends on loading the dataset, while model validation depends on the completion of the training process. At the same time, nine extension relationships are illustrated, used to detail specific behaviors within general use cases. Among these are the specializations for the types of MLP, as well as the variations in the new hybrid model proposed in this study.
Overall, the use case diagram provides a complete and structured perspective on how users interact with the intelligent classification software, while also emphasizing the system modular architecture and the interdependencies among its components.
Starting from the specific functionalities of the Python software depicted in the use case diagram, 15 classes were identified, designed, and implemented to meet the proposed specifications. The UML class diagram [
20], shown in
Figure 5, illustrates the resulting architectural structure, comprising 2 abstract classes and 13 concrete classes, as well as the structural and behavioral relationships among them.
The first central component is the abstract class MLPClassifier, from which five concrete classes are derived, corresponding to the five MLP neural network architectures implemented and analyzed in this study. In the implementation of these five concrete classes, several classes defined in Keras 3.11.0 [
21] and TensorFlow 2.19.0 [
22] libraries are used, providing the mechanisms required for building, training, and validating neural networks Each of these classes implements
buildModel() method in accordance with the structural characteristics of the specific MLP architecture it represents, while ResMLPClassifier class additionally introduces
residualBlock() method for defining residual blocks.
KELMClassifier class and its subclass
KELMNormClassifier are responsible for implementing the kernel-based Extreme Learning Machine classifier. These classes provide methods for kernel computation, training, testing, and evaluation of KELM models, making use of Scikit-learn [
23] library.
Another central element of the diagram is represented by the abstract class HybridStackingWeightedClassifier, which defines the general mechanism of the weighted hybrid classifier introduced in this study. From this abstract class, 4 concrete classes are derived, corresponding to the four variants of hybrid stacking ensemble implemented and analyzed. Their implementation relies on models provided by the scikit-learn library: LogisticRegression, RandomForestClassifier, GradientBoostingClassifier and AdaBoostClassifier.
This abstract class is derived from BaseEstimator class defined in sklearn.base, and during the scaling process it uses StandardScaler class from sklearn.preprocessing. The abstract class HybridStackingWeightedClassifier is composed of two objects: one of type ResMLPClassifier and one of type KELMNormClassifier, as indicated by the two composition relationships between these classes.
OptunaOptimizer class, which uses Optuna library, is employed to identify the optimal values for the hyperparameters of the networks defined by MLPClassifier and KELMClassifier classes, as well as their derived classes. The automated optimization thus performed contributes to improving model performance and automating the hyperparameter selection process. Metric class defines four metrics specific to the multiclass classification problem, using sklearn.metrics module, and is used in the evaluation of all analyzed models.
Through inheritance, aggregation, and association relationships, the class diagram highlights the modularity of the software and the clear separation of responsibilities among its components. Moreover, the proposed architecture facilitates system expansion by allowing new classifiers or meta-models to be easily defined, in accordance with the SOLID object-oriented design principles [
24].