3. Related Work
There is a vast amount of literature on using inertial sensors for position and orientation estimation, addressing various application fields and the challenges associated with estimating and parameterizing orientation [
20]. The accuracy of these predictions depends on the choice of the prediction model and the algorithm employed [
21]. Improving accuracy remains a significant research challenge.
Researchers have gathered diverse experiences in building human motion tracking models based on Inertial Measurement Units (IMUs). These include introducing new sensing techniques, positioning sensors at different body parts, and combining them with wireless communication technologies. Huang et al. [
22] examined the strengths and weaknesses of approaches based on IMUs and multi-view imagery. Supporting their findings, Prasad et al. [
23] demonstrated that it is possible to identify six fundamental human activities using a simple sensor. Ronao et al. [
24] collected smartphone activity identification data and applied a hidden Markov model to classify the six typical human activities. Mo et al. [
25] developed an online physical activity recognition system using wireless wearable sensors that can be recharged through body energy harvesting. This recognition was achieved using a Random Forest (RF) algorithm. Yan et al. [
26] conducted an in-depth investigation into using sensors for healthcare, examining all aspects related to the design of human activity recognition (HAR) systems. They analyzed different types of wearables, environmental sensors, and hybrid sensors, highlighting their strengths and weaknesses. Additionally, they elaborated on data preprocessing, feature learning, and classification techniques. In particular, they focused on manual and automatic feature extraction creation.
Several applications have been developed with interesting methodological approaches. Tao et al. [
27] proposed a new method for recognizing human activities using an attention mechanism with multiple IMU sensors placed at various body locations. The captured signals were preprocessed in the frequency domain, and the most discriminating features were extracted. They introduced a sensor attention mechanism to determine the importance of specific sensors integrated into a trainable deep neural network (DNN) layer. A focus-based casting module was utilized to represent the features precisely. Five public datasets covering daily activities, sports, and automotive maintenance tasks were used to evaluate their model. The results demonstrate superior performance compared to state-of-the-art methods.
Qing [
28] developed a model to correct drift and attitude angle errors in MEMS sensors and the Zigbee network for capturing human movement. He employed a Kalman filter algorithm to compensate for errors, integrating information from the gyroscope, accelerometer, and magnetometer. To enhance tracking accuracy, he applied an algorithm based on sensor sensitivity and another that considered the connections and limitations of human joints. Testing involved sensor data from the wrists, elbows, and shoulders of five subjects across 40 postures. The average essential tracking accuracy was 0.075 m, which improved by 15% with advanced algorithms. Additionally, a backpropagation (BP) neural network was used for the motion identification module, aiming to create a human recognition system on the Android platform to detect falls and generate early alarms, resulting in a more efficient and accurate model than others.
Xefteris et al. [
29] studied the performance of wearable Inertial Measurement Unit (IMU) sensors and vital signs for human activity recognition (HAR), specifically in patients with motor disorders. They employed a feature selection (FS) technique [
30] to extract the most relevant features from the PAMAP2 public dataset [
31], which includes data from three IMUs, a heart rate sensor, and a temperature sensor. Measurements were recorded from nine subjects engaged in 12 different activities. The study utilized five classifiers—CART, Random Forest (RF), k-nearest neighbors (kNN), Linear Discriminant Analysis (LDA), and Support Vector Machines (SVM)—alongside feature fusion techniques like concatenation and Principal Component Analysis (PCA). The experiments revealed that the RF classifier, initially achieving an accuracy of 86.64%, improved significantly with different blending modes, reaching approximately 95% with RF and CART. In particular, the Gradient Boosting Machine (GBM) stack late fusion mode further enhanced accuracy. The best feature selection technique identified was Recursive Feature Elimination (RFE). When comparing the results with other studies using the same dataset, the methods applied in this study demonstrated the highest performance.
Yang et al. [
32] developed a wearable device with multiple sensors to collect muscle activity and movement data simultaneously. Instead of using EMG electrodes, they utilized air pressure sensors [
33,
34] to measure muscle activity more practically and cost-effectively. The prototype includes a control module, an IMU, an air pressure module, and a power module. Researchers collected data from seven men and one woman performing 11 activities over three hours. The extracted features were used as input for five classification algorithms: k-nearest neighbors (kNN), decision trees (DT), naïve Bayes (NB), Support Vector Machines (SVM), and Random Forest (RF). They conducted three classification experiments: Activity Pattern Classification. (APC), Movement Classification (MC), and Combined Classification (CC). The results indicate that DT, NB, and RF performed best, with CC achieving the highest accuracy, precision, recall, and F-measure averages. This highlights the performance improvement resulting from integrating air pressure capabilities into the device.
Patil et al. [
35] created a model that combines LiDAR signals and inertial sensors to estimate human 3D poses in real time. This approach addresses the limitations of vision-based, IMU-based, or heterogeneous sensor methods. LiDAR data track body position, while IMU data estimate the orientation and position of joints. The study used an Ouster OS-0 LiDAR sensor and Xsens IMU sensors to monitor and reconstruct the pose on a 3D avatar. Subjects were asked to walk, sit, and move toward a table to test the system. The system’s accuracy was measured, demonstrating an error margin of 2.82 cm for the pelvis and 2.42 cm for the foot. This confirmed the enhanced accuracy of tracking human movement using multiple heterogeneous sensors.
5. Data Analysis
This research aimed to develop a firmware that could do the following: collect inertial signals using a microcontroller with an embedded module capable of processing tasks locally, predict the activities being performed, and transfer the collected data to a server to assess the motor skills of monitored adults.
Figure 8 shows the diagram of the process steps.
A program called “graph” has been developed to enable real-time visualization on both the local PC and the remote server console. It features a GUI that utilizes the PyQtGraph libraries to display the temporal diagrams of the detected activities.
5.1. Data Set
The inertial data collected every 50 milliseconds can be used to identify four activities: grasping, walking, limb flexion, and arm circular movement. Here, these data are gathered from five subjects, three males and two females, aged between 35 and 50. Each activity is repeated twice on different days in a realistic setting within the Biorobotics Laboratory at the University of Florence. The subjects wear the STM32WB55G device, which incorporates the ISM330DMCX sensor. This device is compact and lightweight, with approximate dimensions of 85 mm × 54 mm × 1.6 mm and a total weight of around 40 g, including the battery. It is designed to securely attach to the body using an adjustable elastic strap, ensuring comfort and stability during movement. Depending on the activity being monitored, the device can be fastened around the upper arm, wrist, or upper limb.
For walking, the device is typically positioned on the upper arm or wrist to capture the arm swing and overall body motion accurately. The adjustable strap ensures that the device remains securely in place without causing discomfort, allowing for natural movement. The energy consumption for this activity is approximately 1.03 J.
For arm circular movements, the device is attached to the upper arm to monitor the circular motion of the arm. This placement allows the sensors to detect the full range of motion and angles of the arm, resulting in precise data for analyzing motor skills. The energy consumption for arm circular movements is approximately 0.95 J.
In grasping activities, the device is secured to the wrist to capture the fine motor movements of the hand and fingers. This positioning ensures that the sensors accurately detect the grasping motion and grip strength, with an energy consumption of approximately 0.89 J.
For leg flexion activities, the device is attached to the upper leg or thigh to monitor the flexion and extension of the knee joint. This allows the sensors to detect precise movements and angles, providing valuable data for motor skill analysis. The energy consumption for leg flexion is approximately 0.93 J.
This setup facilitates accurate data collection while minimizing user discomfort. The device’s lightweight and compact design ensures it does not interfere with daily activities, making it suitable for the continuous monitoring of older people.
The dataset contains 60,000 samples. The data are organized into matrices for the accelerometer and gyroscope for each of the four activities detected. Each matrix consists of 1500 rows and six columns, where the rows represent timestamps, and the columns correspond to the Cartesian axes that help identify the type of activity being performed. The first three columns of each matrix represent the triaxial inertial data, while the following three columns indicate the type of activity, the reference date, and the timestamp. The data collected pertain to activities carried out by adults for 60 s. Readings are recorded with precise timestamps and time intervals (delta times), indicating the duration of each measurement.
Using the “graph” script, a program written specifically for this purpose, we extracted statistical features for each activity, concentrating on the accelerometer and gyroscope data. We compiled a complete set of sensor readings and calculated various statistical features for each axis of both sensors. The Energy from Motion (ENMO) values were calculated to better differentiate between sedentary and light physical activities. This information aids the researchers in analyzing movement patterns, changes in orientation, and other physical activities throughout the recorded period. The statistical features include the mean, mode, median, variance, standard deviation, mean deviation, root mean square, count, minimum, maximum, and various percentiles. Utilizing these features enhances the model’s accuracy in classifying the activities performed. Below is a description of how each feature contributes to the classification.
Starting with the mean, the grasping activity may show a lower average movement value than other activities, as grasping involves smaller and more controlled movements. In contrast, the circular motion of the arm may present a higher average due to a wider range of motion. The average for leg flexion could vary depending on the intensity and frequency of the flexes, while walking might reveal a moderate average, reflecting continuous and regular movement.
Regarding the mode, the grasping activity could indicate the most common hand position during the action. The circular motion of the arm may demonstrate the most frequent arm position throughout the movement. This could represent the most common flexion angle, while walking may reflect the more frequent leg position during movement.
Considering the median, the grasping activity may yield a value close to the mean, pointing to consistent movements. The circular arm movement could result in a higher median, indicating wider movements. The median for leg flexion might display variability similar to the average, while walking could present a stable median, reflecting regular movement.
Regarding variance and standard deviation, the grasping activity will likely have low values, indicating controlled and consistent movements. In contrast, the circular movement of the arm may exhibit high values due to the variability of wide motions. The flexion of the legs may show varying values, which depend on the intensity of the flexion. Walking is expected to produce moderate values, reflecting its regularity.
When analyzing the average deviation and the root mean square, the grasping activity may again show low values, suggesting smooth movements. The arm’s circular movements might present higher values due to their variability. Leg flexion could display values similar to those seen in variance, while walking will likely have moderate values that signify regular movement.
Examining the minimum and maximum values shows that the grasping activity may have closely grouped values, indicating restricted movement. The circular movement of the arm is expected to show a significant difference between the minimum and maximum, reflecting its wide range of motion. The flexion of the legs may also exhibit variable differences, depending on the intensity of the flexion, while walking might display moderate differences that indicate consistent movement.
Considering percentiles, the grasping activity could have closely grouped percentiles, signifying an even distribution of movements. The circular arm movements may show a wide range of percentiles, reflecting their variability. The flexion of the legs might exhibit varying percentiles similar to variance, whereas walking could demonstrate moderate percentiles that indicate regular movement.
Finally, in terms of counting, the grasping activity may have a high count if it occurs frequently. The circular movement of the arm may vary depending on the duration of the activity. Leg flexion could also have a high count if performed repetitively, while walking and is expected to produce a moderate count that reflects the duration of the walk.
In real time, we sent the graphical representation of movement data to the monitoring system using the “graph” script to be displayed on the remote console (
Figure 9,
Figure 10,
Figure 11 and
Figure 12). Medical staff will later evaluate the motor skills of the monitored subjects in clinic to decide which therapeutic actions to take.
Connecting a Xiaomi Redmi Note 11 Pro smartphone to the STM32WB55G via Bluetooth Low Energy (BLE) and using the Android STBLE sensor app allows us to view the data transmitted by the sensors. The app allows us to plot these data on a phone screen (see
Figure 13) and log them for future transfers.
The acquired data and its statistical features are then sent to a predictive model designed to recognize the activities being performed.
5.2. Evaluation and Analysis Methodology
This section outlines the methods used to conduct tests, measure energy consumption, evaluate motor capacity, and analyze the system’s use. The goal is to provide a detailed description of the procedures adopted to ensure the accuracy and reliability of the obtained results.
The tests were designed to simulate real-world usage conditions, enabling a comprehensive system performance evaluation. Energy consumption was measured using precision instrumentation, while motor capacity was assessed through specific tests tailored to meet user needs. Additionally, the analysis of system usage considered various aspects, including functionality, usability, and overall usefulness.
Tests were conducted using a series of standardized exercises that assessed balance, mobility, and muscle strength.
Test Methodology
Description: Participants must grasp and lift objects of different sizes and weights.
Results: The Random Forest model achieved 0.998576 accuracy, 0.9998675 precision, 0.998775 recall, and a 0.998763 F1-score.
Description: Participants must perform circular arm movements, both clockwise and counterclockwise.
Results: The Random Forest model achieved 0.998885 accuracy, 0.998875 precision, 0.998885 recall, and a 0.998857 F1-score.
Description: Participants must perform leg push-ups, alternating between deep and light push-ups.
Results: The Random Forest model achieved 0.998665 accuracy, 0.998657 precision, 0.998665 recall, and a 0.998648 F1-score.
Walking:
Description: Participants must walk on a set path, either in a straight line or with changes of direction.
Results: The Random Forest model achieved 0.998885 accuracy, 0.998765 precision, 0.998786 recall, and a 0.998783 F1-score.
The data collected can be used to personalize interventions and improve users’ quality of life. In addition, the system has proven helpful in identifying fall risks and monitoring progress in rehabilitation programs.
The energy consumed was calculated by measuring the current obtained with a digital multimeter. The multimeter was set to current measurement mode, and the probes were connected to the terminals of the device while it was running. The current value was read directly from the multimeter display. This method allowed us to obtain accurate and reliable measurements of the current consumed during the tests. The average energy consumed was 0.95 J (calculated taking into account V = 3.3 V and t = 60 s).
An analysis of computational efficiency was focused on processing times, energy consumption and latency.
Processing Time: Our system utilizes the STM32WB55 microcontroller, which is recognized for its energy efficiency and processing capabilities. The average processing time is 50 ms per detection cycle. This reduced processing time is crucial for real-time applications, such as monitoring the health of older people, where rapid responses are essential.
Power Consumption: Thanks to its low-power optimized architecture, the STM32WB55 microcontroller consumes an average of 16.5 mW during active processing. This feature is particularly advantageous for wearable devices that require extended battery life.
Latency is a vital metric for human activity recognition applications, as it directly impacts system responsiveness. Low latency is necessary to ensure the system can quickly respond to changes in activity. Based on tests, with a processing time of 50 ms per cycle, the total latency amounts to 110 ms (50 ms processing time plus 60 ms average communication time for the TCP/IP protocol).
Additional analyses were conducted, including evaluations of motor ability and the practical application of the model.
Assessing the motor skills of older adults is essential for monitoring their health and well-being. While preliminary tests were initially performed on younger subjects, these early results provide a strong foundation for expanding the research to older populations. The system described in this study classifies specific actions the subjects perform and offers a comprehensive assessment of their motor skills. This process is vital for identifying any abnormalities or declines in motor abilities, enabling timely and targeted interventions. The assessment methodology is based on several parameters, including the following:
Efficiency. This measures the speed and fluidity of movements by evaluating the time taken to complete a task and the accuracy of the movements;
Safety. This examines stability and the risk of falls while performing tasks, measured by the number of accidents or near-accidents recorded during monitoring;
Independence. This determines the patient’s degree of autonomy in performing daily activities, assessed by how often the patient requires assistance to complete tasks.
The findings for the mentioned parameters are as follows:
Efficiency. The average time to complete the grasping task was 5 s, with a movement accuracy of 99%;
Safety. An accident occurred during walking tests, and some issues were observed during leg flexion;
Independence. Assistance was requested only once for grasping an object across all tests conducted throughout the day.
These preliminary results provide a foundation for expanding research to older adults. The continuous monitoring system allows for observing and analyzing specific movements, such as grasping, walking, leg flexion, and circular arm movements. These movements were chosen for their clinical relevance and impact on the quality of life for older adults.
Integrating this IoT application into remote care services enhances the quality of life for elderly individuals and supplies healthcare professionals with valuable information to assess motor skills. The system enables the timely identification of any decline in motor skills, promoting targeted and personalized interventions.
To evaluate the model’s usage, participants were asked to provide ratings from 1 to 5 based on the following parameters:
Functionality. This parameter assesses the speed of the model’s responses;
Usability. This parameter evaluates how easy it is to use the model;
Utility. This parameter measures the model’s effectiveness in promoting user independence.
Table 1 presents a rating scale with specific meanings assigned to each value.
The evaluations obtained are presented in
Table 1.
Figure 14 shows the values of the mean and standard deviation of the three parameters examined.
Based on user feedback, the characteristics of each movement (grasping, arm circular movement, walking, and leg flexion) were assessed in terms of ergonomics and reliability.
Ergonomics refers to how easy it is to perform a movement without causing stress or fatigue. In contrast, reliability indicates users’ perceptions of the movement’s stability and error-free execution.
Using the evaluation scale defined earlier, an average rating close to 5 was achieved, as illustrated in
Figure 15.
In light of the results, the model has proven effective and easy to use, and as having a simple user interface. However, its current limitation is that it can only be utilized in scientific environments and with non-elderly subjects. Extending its application to real-world environments involving older individuals is essential to fully validate the model, as it is currently regarded as a potential tool.
In the context of this study, extending the application to older subjects will allow us to assess how the model responds to age-related factors such as slower movements, difficulty in movement, and tremors, which are commonly seen in conditions like Parkinson’s disease.
Older adults tend to move more slowly due to decreased muscle strength and coordination, which can impact the accuracy of the sensors in detecting and classifying movements. Additionally, difficulty in movement often results in irregular or incomplete motions, leading to potential errors in activity classification. Tremors can introduce noise into sensor data, interfering with the model’s ability to recognize tasks accurately.
To address these challenges, the model incorporates several features. Standardizing data characteristics ensures that all variables are measured on the same scale, allowing the model to treat data uniformly. This standardization helps the model recognize slower movements such as a leg flexion performed by an older person as valid, rather than classifying them as abnormalities.
Data filtering is also crucial for reducing the noise caused by tremors, which are prevalent in older adults. Techniques like the Kalman filter can merge data from multiple sensors and minimize noise. For instance, if an elderly person experiences tremors, the Kalman filter can assist in differentiating genuine movements from the tremors, thereby enhancing the accuracy of task recognition.
Furthermore, extracting statistical characteristics allows for the more effective capturing of variations in older adults’ movements. Statistical features such as mean, variance, and standard deviation are particularly beneficial. For example, the average acceleration data can indicate the general activity level, while the variance may reveal the erratic movements. By utilizing these features, the model can better adapt to the specific movement patterns of older adults, accurately recognizing activities even when movements are slower or less coordinated.
Finally, employing robust models such as Random Forest, which are less prone to overfitting, can significantly enhance the model’s ability to generalize to older movements, thereby reducing classification errors.
The movement patterns of older adults can vary significantly from one individual to another. Random Forest is a machine learning technique that creates multiple decision trees and combines their outcomes to make a final prediction. This method enhances the model’s ability to recognize activities in various conditions accurately. Furthermore, customizing the model can help identify and rectify age-related errors. For instance, if the model detects a specific walking pattern for an older individual, it can adapt to recognize this pattern as expected for that person. This customization is achieved through continuous learning, where the model is regularly updated with new data to enhance its accuracy.
The extension of this model for elderly users has been postponed to a second phase, as it requires a careful and methodical approach, which will take time to define and organize. The key activities include the following:
Obtaining Ethics Committee approval ensures that all activities are conducted ethically;
Identifying elderly participants and their available caregivers for involvement in the study;
Ensuring that all participants give their informed consent;
Involving experts to guarantee the quality and accuracy of evaluations;
Conducting tests to assess the cognitive abilities of participants;
Developing a detailed protocol to guide all activities;
Providing adequate training on the use of the model;
Carrying out tests according to the established protocol;
Collecting and analyzing the data obtained from the tests;
Gathering feedback on the ratings provided;
Adjusting the model based on the feedback received;
Offering emotional and psychological support to participants to ensure their well-being throughout the process;
Organizing follow-up sessions to monitor progress and provide additional support;
Promoting community involvement to raise awareness and support for older individuals.
5.3. Prediction
The predictive model used is called “Random Forest”. This model is based on a supervised learning algorithm and employs a bagging technique that utilizes a collection of decision trees to make predictions. The supervised learning algorithm leverages an ensemble learning method for regression, which combines the predictions from multiple machine learning algorithms to produce more accurate results than any single model.
One key benefit of the Random Forest model is its ability to reduce overfitting, a common issue associated with individual decision trees. The model is structured as a series of decision trees, each constructed from a random subset of the training data. Each tree independently predicts outcomes based on another random subset of the data, and the final prediction is determined by a majority vote among all the trees.
A critical feature of the Random Forest model is the randomness in selecting attributes for each tree. This randomness decreases the correlation among the trees, improving the overall robustness of the model. The entire process can be broken down into several phases, as follows:
Creating subsets. For each node in a tree, a random subset of traits is selected from the original dataset. This subset is typically much smaller than the total number of available features;
Choosing the best feature. From the randomly selected traits, the feature that best separates the data is determined using a quality measure, such as entropy or the Gini index;
Repeating the process. This process is repeated for each node in every tree within the forest, ensuring that each tree is constructed using a different subset of characteristics;
Setting hyperparameters. Before training the model, several important hyperparameters must be configured (see
Table 2).
The Grid Search technique was employed to identify parameters that optimize model predictions. This process involved defining a grid of parameters and the number of cross-validation folds. The parameters considered were as follows:
n_estimators, [100, 200,];
max_depth, [None, 10,];
min_samples_split, [2, 5,];
min_samples_leaf, [1, 2,];
max_features, [auto, sqrt].
All possible combinations of these parameters were evaluated. The model was trained and assessed for each combination using cross-validation (n = 5). The combination that resulted in the best performance was selected as the optimal set of parameters.
The ‘fit’ method was used to train the model on each combination of parameters, and a dedicated function was employed to identify the best combination. The model was trained and evaluated five times, using a different portion of the data as the test set. The optimal parameters identified were as follows:
n_estimators, 100;
max_depth, none;
min_samples_split, 2;
min_samples_leaf, 1.
The total number of iterations conducted while searching for the optimal parameters was 160, corresponding to the number of combinations specified in the grid.
The training phase used 80% of the acquired data.
With the next prediction step, the model correctly recognized the four predicted activities.
To validate the reliability of the model, the following metrics were derived: accuracy, precision, recall, and F1 (see
Table 3).
The model can make highly accurate predictions based on these values. The next step is to represent the ROC and AUC curves (
Figure 16).
Figure 16 illustrates the ROC curve representing the classification model’s performance. The False Positive Rate (FPR) is shown on the
y-axis, and the True Positive Rate (TPR) is displayed on the
x-axis. The area under the curve represents the AUC (Area Under the Curve). This value indicates how well the model can differentiate between the various asset classes examined. A value of 1 means that the model can accurately identify positive and negative classes without errors.
To validate the performance, the model was compared with another neural network, the self-normalizing neural network (SNN). This type of neural network is designed to keep the activation of its neurons within a desired range without requiring external normalization techniques, such as batch normalization. Each layer is normalized and serves as input to the next layer. SNNs utilize self-normalizing neural activations that automatically converge toward a mean of zero and a unit variance. This is achieved through the SELU (Scaled Exponential Linear Unit) activation function. The SELU function accelerates convergence during training, allowing it to learn faster and more effectively than other activation functions without additional processing. The SELU can be expressed mathematically as follows:
Graphically, it can be represented as in
Figure 17.
The SELU (Scaled Exponential Linear Unit) function is continuous and differentiable throughout its entire domain, including the origin. This property ensures there are no jumps or discontinuities in the curve, which is beneficial for optimization during the training of neural networks.
In the positive region (X > 0), the SELU function behaves linearly. This means the output increases proportionally with the input. The scale parameter, approximately equal to 1.0507, determines the slope of the curve in this region.
In the negative region (X < 0), the SELU function applies an exponential transformation. This transformation keeps the output close to zero, but not exactly zero, which helps mitigate the “dying ReLU” problem, where neurons stop updating during training. The alpha parameter, approximately 1.67326, influences the curve in this region.
The convergence property of SELU networks allows them to trend towards a mean of zero with unit variance, enabling the training of deep networks with many layers and making the learning process highly robust.
The metrics obtained were as follows (see
Table 4).
The ROC curve was also derived for the SNN network (
Figure 18).
The metrics compared between the suggested model and others have shown that the Random Forest model significantly outperforms the SNN in all key indicators. A detailed examination of the individual metrics reveals the following:
Accuracy reflects the proportion of correct forecasts out of the total forecasts made. The Random Forest model achieves impressive accuracy rates of 0.998885 for arm circular movements, 0.998885 for walking, 0.998576 for grasping, and 0.998665 for leg flexion, correctly classifying most samples. In contrast, the SNN model has an accuracy of 0.91 for arm circular movements, 0.87 for walking, 0.87 for grasping, and 0.67 for leg flexion, indicating more classification errors;
Precision is defined as the proportion of true positives among all samples classified as positive. The Random Forest model has a precision of 0.998875 for arm circular movements, 0.998765 for walking, 0.998675 for grasping, and 0.998657 for leg flexion, accurately predicting positive classes across all samples. Conversely, the SNN model has a precision of 0.84 for arm circular movements, 0.79 for walking, 0.88 for grasping, and 0.81 for leg flexion. This suggests that the SNN has a higher likelihood of false positives, meaning samples classified as positive may not be;
Recall measures the proportion of true positives among all positive samples. The Random Forest model demonstrates recall rates of 0.998885 for arm circular movements, 0.998786 for walking, 0.998775 for grasping, and 0.998665 for leg flexion, successfully identifying 100% of positive samples. In comparison, the SNN model’s recall is only 0.91 for arm circular movements, 0.87 for walking, and 0.87 for grasping, indicating it fails to identify a significant number of positive samples, resulting in a higher incidence of false negatives;
F1 Score is the harmonic average of precision and recall, providing a balanced measure of a model’s performance. The Random Forest model achieves F1 scores of 0.998857 for arm circular movements, 0.998783 for walking, 0.998763 for grasping, and 0.998648 for leg flexion, reflecting a good balance between precision and recall. In contrast, the SNN model’s F1 scores are 0.87 for arm circular movements, 0.83 for walking, 0.88 for grasping, and 0.74 for leg flexion, indicating difficulties in maintaining an effective balance between these metrics.
The evaluation also included the ROC curves generated for the proposed model and the comparison network. Once again, the Random Forest model demonstrated superior performance.
Overall, the best performance metrics indicate that the Random Forest model is more reliable and accurate in predicting activities, effectively minimizing false positives and negatives.
In the field of human activity recognition (HAR), the choice of predictive model is crucial to ensure accuracy and reliability. In this study, we initially compared two models, Random Forest and Self-normalizing Neural Network (SNN). While both models demonstrated good performance, with Random Forest showing increased accuracy and robustness, the need arose to explore further solutions that could further improve the ability to recognize tasks. The model chosen was the LSTM (Long Short-Term Memory). This network represents an advanced solution for the recognition of human activities, thanks to their ability to manage timelines and capture long-term dependencies. This makes it ideal for analyzing continuous and complex movements, improving accuracy in recognizing activities that develop over time. Unlike traditional models, LSTMs are designed to store relevant information for extended periods, making them ideal for analyzing sequential data such as those from wearable sensors. They can better handle time variations in data, improving accuracy in recognizing complex tasks. In addition, their structure reduces the risk of overfitting, ensuring the greater generalization of the model. The LSTM architecture is composed of memory units that can retain information for long periods. Each LSTM unit has three main ports: the input port, the output port, and the forgetfulness port. These ports regulate the flow of information across the network, allowing the LSTM to remember or forget relevant information. To implement an LSTM model for HAR, the data acquired by the sensors (accelerometer and gyroscope) are divided into timelines. These sequences are then used as input to the LSTM network. The network learns the temporal characteristics of the data, improving the accuracy in the recognition of activities. The results obtained with this network, whose architecture is represented in the diagram in
Figure 19, were equally significant (
Figure 19).
The Evaluation metrics are reported in
Figure 20 and in numerical format in
Table 5.
The model showed remarkable performance, with results such as the following (
Table 5).
Comparing the two models, it is observed that the Random Forest model performed better than the LSTM model. However, the LSTM model demonstrated a strong ability to recognize human activities, especially for movements involving complex timelines.
Random Forest’s integration with LSTM for human activity recognition can create a more robust, accurate, and adaptable system. This combination leverages the strengths of both models, improving performance and reducing errors, offering a complete solution for human activity recognition. Integration offers numerous advantages, especially when it comes to analyzing timelines. While Random Forest is very effective at providing an accurate classification of static features, it fails to capture temporal dependencies. In contrast, LSTM is designed to manage timelines and capture long-term dependencies in data. This makes it ideal for analyzing continuous and complex movements, improving accuracy in recognizing activities that develop over time.
Another aspect to consider is the frequency of data sampling. Random Forest can effectively handle sampled data at different frequencies, providing robust classification. On the other hand, LSTM can leverage high-frequency data to capture fine details of movements, while low-frequency data can be used to reduce noise and improve model stability. The integration of the two models allows these aspects to be balanced, optimizing precision and overall robustness.
When it comes to motion characteristics, Random Forest can effectively classify parameters such as acceleration, speed, and rotation. However, LSTM can analyze the timelines of these features, capturing dynamic variations and improving the understanding of complex movements. This combination allows for a more complete and accurate view of human activities.
Managing transitions between different tasks is another strength of RF-LSTM integration. Random Forest can provide the accurate classification of individual activities, while LSTM can manage transitions between different activities, capturing time dependencies and improving accuracy in recognizing activities that change over time. This is especially useful for tasks that involve continuous movement and smooth transitions.
Additionally, Random Forest can effectively classify tasks of different durations, providing a solid foundation for analysis. LSTM, on the other hand, can handle long-running tasks, capturing time dependencies and improving accuracy in recognizing tasks that develop over time. The integration of the two models allows them to effectively manage tasks of different durations, improving overall accuracy.
Finally, noise filtering is a crucial aspect of improving data quality. Random Forest is robust to noisy data and can effectively handle noise in sensory data. LSTM can leverage preprocessing techniques, such as noise filtering, to improve data quality and capture time dependencies. The integration of the two models makes it possible to reduce noise and improve the overall quality of recognition
In addition to the benefits of integrating the two models, it is important to consider some unique features of Random Forest that make it an excellent choice for asset classification.
First, Random Forest is known for its robustness, and tends to be less prone to overfitting than models like neural networks. This feature may explain its superior performance in analysis. In addition, Random Forest is more interpretable than neural networks, allowing a better understanding of the features that significantly influence predictions. Another aspect to consider is training time—neural networks often require longer training times and more computational resources than Random Forest.
These factors lead to the conclusion that Random Forest is an excellent choice for asset classification. Considering its characteristics compared to a logistic regression model, which represents another type of machine learning algorithm, the validity of the choice can be confirmed in relation to the type of movements examined. In particular, several advantages emerge related to the ability to manage nonlinearities, provide estimates of the importance of variables and tolerate missing data, as well as versatility.
The management of nonlinearities is particularly effective with Random Forest, which manages data with nonlinear relationships and complex interactions between variables. This is crucial for motion detection, where patterns can be highly non-linear. In contrast, logistic regression, being a linear model, can struggle to capture complexities in motion data.
Another important aspect is robustness and accuracy. Random Forest, being a set of decision trees, tends to be more robust and less prone to overfitting than individual models. This leads to greater accuracy in forecasting. On the other hand, logistic regression can be more susceptible to overfitting, especially with complex, non-linear data.
Random Forest provides an estimate of the importance of variables, allowing one to identify which characteristics of movements (such as grip, arm movement, leg flection, walking) are most relevant to predictions. Logistic regression, while it can provide coefficients for variables, does not offer the same clarity and interpretability in relation to relative importance as Random Forest.
When it comes to handling missing data, Random Forest is more tolerant, and can handle it more effectively during training. Logistic regression, on the other hand, requires the more careful management of missing data, such as imputation, which can introduce bias.
Finally, versatility is another point in favor of Random Forest, which can be used for both classification and regression problems, offering more flexibility for any future extensions of the project. Logistic regression, on the other hand, is limited to binary or multi-class classification problems.
6. Discussion
This paper proposes a model for recognizing human movements developed as part of an assistance service for vulnerable or moderately fragile older adults. The project aims to promote healthy lifestyles and facilitate health monitoring. The system has several key functions, including the following:
- -
Motion tracking;
- -
Data analysis to identify potential anomalous situations.
Caregivers guide older adults to perform activities such as grasping, walking, flexing their legs, and making circular arm movements to monitor their motor skills. The chosen technology is user-friendly and space-efficient, which enhances its acceptability among older users. The system is designed to minimize any discomfort elderly individuals may experience while using technology. Furthermore, caregivers play a crucial role in helping older adults overcome their reluctance towards technology by providing practical guidance on how to use these tools. This model has several distinguishing features compared to other studies and applications in human activity recognition.
Technology: This system utilizes a multi-protocol dual-core wireless STM55WB microcontroller, renowned for its energy efficiency and robust processing capabilities. It is well-suited for developing IoT applications in edge machine learning. It is ideal for wearable devices requiring low power consumption while maintaining high performance. The technology integrates a high-performance 3D digital accelerometer and a 3D digital gyroscope with machine learning cores. This combination effectively captures linear acceleration and rotational movements, enabling the precise detection of specific actions such as grasping, arm circular movements, leg flexion, and walking. The model is optimized to recognize these movements, which are essential for applications in the physical rehabilitation, health monitoring, and gesture control of devices.
Methodology: The approach employs Random Forest as a predictive model. Random Forest is advantageous as it can manage large datasets and deliver accurate, robust predictions. Its resistance to overfitting makes it ideal for real-world applications. The model is also scalable, including additional movements or sensors, which enhances its detection and recognition capabilities. This flexibility enables the system to be adapted to meet various needs and applications. Furthermore, the methodology includes a continuous learning algorithm that improves the system’s accuracy over time by adapting to the specific movement patterns of individual users.
Type of classes used: The movements analyzed signify a significant advancement in monitoring and supporting older adults. This focused approach generates valuable data, enhances quality of life, and facilitates personalized interventions. The system categorizes movements into various classes, such as fine motor skills (e.g., grasping small objects), gross motor skills (e.g., walking), and complex movements (e.g., arm circular movements). This classification allows for a comprehensive assessment of an individual’s motor abilities and helps in tailoring specific rehabilitation programs.
Features: The device is designed for ease of use and low power consumption. It includes a mode of use that makes it easy for seniors to interact, ensuring they can use the device with minimal assistance. The system’s compact design makes it easy to wear and carry, enhancing its convenience for everyday use. In addition, the device offers real-time feedback and alerts, allowing for an immediate response to any anomalies detected in the user’s movements. Integration with healthcare applications allows caregivers and healthcare professionals to remotely monitor the health status of the elderly, ensuring timely interventions and continuous support.
Many human activity recognition systems rely on multiple accelerometers or various combinations of sensors, which can be cumbersome. Using several sensors often leads to inconsistent motion observations across devices, resulting in increased costs and complications in application usage, which can be uncomfortable for the wearer. In contrast, this model utilizes a single sensor that has proved efficient during the classification phase, achieving satisfactory accuracy. The reliable data obtained support our choice to use an inertial module as the sole input for the prediction model. Previous studies have successfully employed single sensors in their methodologies [
23,
41,
42].
In classifying human activities, the literature reveals numerous studies that employ different prediction models, yielding varying results. The model hypothesized in the article is designed for monitoring the health of the elderly in home or nursing home settings, and the testimonials report varied applications, including physical rehabilitation contexts, health monitoring, and the gesture control of devices. It also focuses on specific movements such as grasping, leg flexion, circular arm movement and walking, while the testimonies examine a different range of activities, including daily and sports movements. For example, Tao et al. [
27] analyzed daily, sports, and automotive maintenance activities. This difference may reflect the different needs and objectives of the various studies, as well as the different technologies and methodologies used. Other authors offer an overview of various studies and similar applications using inertial sensors for monitoring human activities. For example, Huang et al. [
22] examined the strengths and weaknesses of IMU-based approaches and multi-view images, while Prasad et al. [
23] demonstrated that it is possible to identify six fundamental human activities with a simple sensor. These testimonials highlight the effectiveness of MEMS sensors in activity monitoring and their applicability in various settings, such as physical rehabilitation and health monitoring. One of the main differences between the hypothetical model and the testimonies concerns the technology used. While the hypothesized model focuses on a specific microcontroller (STM55WB) with built-in MEMS sensors, the testimonies report a variety of sensors and technologies, including multiple sensors and different data fusion techniques. For example, Huang et al. [
22] use multi-view images, while Mo et al. [
25] employ wearable sensors that can be recharged via body energy. This variety of approaches can offer different benefits, such as increased accuracy and robustness, but also increased complexity and cost. Another significant difference concerns the artificial intelligence algorithms used. The model hypothesized in the article uses the Random Forest for the classification of activities, while the testimonies vary the algorithms used, including neural networks, hidden Markov models, and other machine learning techniques. For example, Ronao et al. [
24] apply hidden Markov models, while Mo et al. [
25] use Random Forest. This diversity of algorithms can affect the accuracy and robustness of the task recognition system.
This study chose to use the Random Forest algorithm as an alternative to the neural networks commonly employed by other researchers. The transformation of accelerometric data into classes that were input into the prediction model demonstrated the efficiency of the Random Forest algorithm. The results indicate that, in most cases, events were accurately identified, with performance exceeding that of previous studies using neural networks. The validity of the Random Forest algorithm as a predictive model has been established in earlier experiments, where it was compared with various neural networks [
43,
44]. Random Forest enhances classification outcomes by randomizing data. Shuffling the order of the data, a valuable preprocessing step in machine learning offers several benefits for model training. First, it helps avoid order bias, preventing the model from learning patterns based on the specific data sequence, which could hinder generalization to new, unseen data. By introducing diversity and breaking away from fixed patterns, shuffling improves the model’s ability to generalize effectively. This process is especially crucial for reducing overfitting, as it prevents the model from memorizing patterns associated with the order of the data.
The novelty of this work lies in its simplified recognition method, high recognition accuracy, reduced computational load through local processing, diversity of recognizable activities, and its potential for integration and communication with smart home technologies and IoT platforms for monitoring activities.
The model proposed in the article provides a specific and targeted solution for monitoring the health of older people, emphasizing ease of use and sustainability. However, testimonials highlight various approaches and technologies that can deliver benefits, including increased accuracy and robustness. These alternatives may also introduce greater complexity and cost.
While the proposed system is effective, it has some limitations due to the types of subjects and the testing environment utilized. The tests were conducted with healthy, young participants. Future experimentation with older individuals and those with limited mobility is necessary. Additionally, the tests were conducted in simulated living environments with limited obstacles. Extending these tests to real home settings and nursing homes with various layouts would be beneficial. Future research could also explore movements beyond those typically studied and evaluate the effectiveness of integration into a smart home system. Using a hybrid approach that combines RF with convolutional neural networks (CNNs) or Long Short-Term Memory (LSTM) networks can enhance the robustness and generalization capabilities of the classification system. While Random Forest models offer stability and interpretability, neural networks are more adept at adapting to complex variations in data.