1. Introduction
Stroke is a major contributor to mortality and long-term disability in low- and middle-income countries, representing approximately 70% of global stroke incidence and 87% of stroke-related deaths and disability-adjusted life years (DALYs) [
1]. It is typically caused by disrupted cerebral blood flow that leads to neuronal damage and cell death [
2]. Conventional rehabilitation relies on clinicians and therapists to repeatedly stimulate relevant neural pathways through structured motor training to promote functional recovery [
3]. Such scientific rehabilitation protocols require data collection by repeating standard motions multiple times, inevitably leading to repetitiveness and boredom. To counteract this, virtual reality (VR) can be leveraged to significantly increase patient engagement during these inevitable repetitive tasks [
4]. In practice, however, the limited availability of therapists frequently constrains the intensity and continuity of rehabilitation services [
5]. As a result, home-based rehabilitation has become an essential complement to clinical care, with therapists prescribing individualized programs that patients can perform outside the hospital. In parallel, recent advances in Internet of Things (IoT) technologies have enabled wearable-sensor-based motion capture and cloud analytics for remote monitoring and near-real-time assessment, helping reduce clinical workload, while rehabilitation robots can deliver force feedback and motion guidance to improve movement execution and training consistency in home settings [
6].
Within home-based rehabilitation, patients typically follow the therapist’s instructions and self-report their progress. Prior work suggests that more than 90% of rehabilitation activities occur at home [
7]. Despite its convenience, adherence is often undermined by the uncertainty of the patients about whether their movements are performed correctly, which can prolong recovery and increase overall treatment costs [
8,
9]. IoT-supported feedback—particularly when presented as intuitive visualized reports—can help patients interpret movement quality, while robot-assisted training can provide personalized guidance to support engagement and adherence [
10]. More broadly, interactive rehabilitation systems have been shown to increase patient involvement [
6,
11] and are associated with meaningful functional improvements [
12,
13,
14]. Such systems also generate rich multimodal data streams (e.g., skeletal kinematics and electromyography) through sensing technologies [
15,
16,
17], enabling quantitative analyses of rehabilitation progress and motor performance [
18,
19].
For rehabilitation motion assessment, clinical scales such as the FMA [
20] and WMFT [
21], as well as many home-based exercise protocols, place particular emphasis on upper-limb motor function [
16,
22]. Furthermore, while these established clinical scales are widely used, they present an inherent limitation: potential subjectivity. Clinical assessments can vary due to different physicians’ backgrounds and are fundamentally based on discrete value ranges, which may fail to capture continuous changes in motor quality [
4]. In ref. [
22], the authors characterized movement performance in terms of three components—range of motion (ROM), smoothness, and compensation—captured by kinematic factors, where smoothness is especially informative for analyzing velocity-related patterns and tremor. Liao et al. [
23] further categorized rehabilitation assessment approaches into discrete movement scoring, rule-based methods, and template-based modeling. To address these assessment needs, Artificial Intelligence (AI) and specifically deep learning (DL) architectures have become pivotal. Broadly, Artificial Intelligence (AI) enables machines to perform tasks requiring human-like cognition. Within this broad field, deep learning (DL) models are a powerful subset of AI techniques that can be directly compared and opposed to traditional machine learning (ML) classifiers. Specifically, rather than relying on hand-crafted features, DL training can be performed directly on complex raw data, often leveraging the hierarchical filtering effect of convolutional layers [
24,
25,
26]. Discrete classification models (e.g., SVM and Random Forest) can effectively distinguish movement categories, but they may be less sensitive to subtle and continuous changes in motor quality [
27,
28]. Template-based methods compare patient trajectories with reference patterns using probabilistic density models such as GMM [
29] and HMM [
30] to represent variability; however, constructing robust multi-level models remains challenging. To enhance interpretability, scoring functions are often used to transform template outputs into normalized scores (e.g., 0–1 or 0–100) [
23,
29]. Although a recent survey highlights various DL models currently employed in remote monitoring for home-based rehabilitation [
31], the specific application of DL classifiers to dynamically and explicitly discriminate the affected side in bilateral tasks remains largely unexplored.
Despite recent progress in automated rehabilitation assessment, existing methodologies exhibit two critical limitations that hinder their clinical utility in home-based settings. First, the lack of explicit affected-side identification: most current frameworks evaluate bilateral upper-limb motor function globally or assume the impaired side is predefined. In bilateral rehabilitation tasks, this oversight merges kinematic data from both the healthy and impaired limbs, diluting the sensitivity of the assessment and failing to isolate the actual motor deficit. Second, the inadequacy of continuous quality quantification: while discrete scoring models (e.g., SVM and Random Forest) can classify broad movement categories or impairment levels effectively, they suffer from coarse category boundaries. Consequently, they are less sensitive to subtle, continuous changes in motor quality, failing to capture the fine-grained daily improvements crucial for patient motivation. Quantitatively, the clinical validation of prior automated rehabilitation-assessment frameworks remains limited and highly heterogeneous. Lee et al. [
22] collected Kinect recordings from 15 post-stroke survivors and 11 healthy subjects and used therapist-provided reference scores to assess exercise quality; however, their framework was designed for exercise-level scoring rather than explicit side-specific bilateral discrimination. Kim et al. [
32] enrolled 41 patients with hemiplegic stroke and used Kinect-based motion capture to estimate upper-extremity Fugl–Meyer scores; however, the task formulation focused on item-wise clinical score prediction rather than continuous trajectory-quality modeling in bilateral reaching tasks. By contrast, Liao et al. [
29] explicitly noted that the main validation of their deep learning framework was conducted primarily on healthy-subject rehabilitation data and that a substantial portion of the dataset lacked clinician-provided ground-truth quality labels. Therefore, among the representative studies most closely related to automated quantitative rehabilitation assessment cited here [
22,
29,
30,
32], only a limited subset incorporated real stroke patient motion data, and even fewer combined patient-specific high-dimensional kinematic measurements with therapist- or clinician-anchored reference standards for continuous quality estimation. Although the present pilot study involved a smaller cohort of five stroke patients and three healthy controls, it was specifically designed to capture synchronized multimodal bilateral reaching data and to anchor the resulting probabilistic modeling to clinically meaningful side-specific assessment. This distinction is important because, in bilateral upper-limb rehabilitation, side-specific impairment may be obscured if pathological motion data are sparse, absent, or not explicitly linked to clinically grounded scoring criteria. Conversely, while some continuous matching methods such as DTW can reveal nuanced differences, they often operate directly on raw sensor trajectories, which reduces robustness under sensor noise and natural inter-subject variability.
Addressing these existing problems is crucial because failing to isolate the affected side often results in misleading global scores that mask subtle, localized motor deficits. Consequently, therapists may prescribe inappropriate training intensities. Furthermore, without continuous and fine-grained quality quantification, patients may suffer from reduced motivation due to untracked micro-recoveries, ultimately delaying their overall functional restoration.
To address the limitations of existing rehabilitation assessment systems—such as the insensitivity of discrete scoring methods to subtle motor changes, the lack of affected-side identification in previous template-based approaches, and the tendency of current systems to emphasize training delivery over objective progress quantification —this study proposes a novel, integrated assessment framework. The core innovative contributions of this paper are summarized as follows:
Hybrid Side-Specific Assessment Pipeline: Unlike previous works that evaluate upper-limb function globally, the proposed framework introduces a sequential methodology. It first employs a deep learning multi-class classifier to explicitly identify the affected side (left, right, or healthy) using smoothness-related kinematic features. This guarantees that subsequent quality evaluations are highly targeted and side-specific.
Interpretable Continuous Quality Quantification: This study advances traditional template-based modeling by conditioning a Gaussian Mixture Model (GMM) specifically on the identified affected side. By integrating a calibrated scoring function, the framework maps complex log-likelihoods into an intuitive 0–1 index. Crucially, unlike many existing theoretical models, this quantitative index is clinically validated against therapist-defined gold standards using motion data from real stroke patients at Taipei Veterans General Hospital.
Synergistic AIoT and Robotic Architecture: Moving beyond conventional VR platforms, the proposed framework combines wearable Internet of Things (IoT) sensing, near real-time AWS cloud analytics, and a force-feedback rehabilitation robot. This multifaceted integration not only ensures movement execution fidelity via physical guidance but also fuses multimodal data streams to enhance the statistical robustness of remote motor assessment in home-based settings.
The rest of this paper is organized as follows:
Section 2 reviews related work.
Section 3 details the proposed system architecture and methodology.
Section 4 presents the experimental results. Finally,
Section 5 provides the discussion, and
Section 6 concludes the paper.
3. Method
3.1. Participants
To evaluate the computational feasibility and system stability of the proposed assessment framework, this research was conducted as an initial proof-of-concept pilot study. A total of 5 post-stroke participants and 3 healthy control subjects were enrolled. The experiment was conducted at the rehabilitation department of Taipei Veterans General Hospital. All participants provided written informed consent prior to the experiment.
Specifically, the system integrates a rehabilitation robot equipped with a multi-degree-of-freedom force-feedback arm. This robotic arm guides the reaching trajectories and delivers adaptive resistance or assistance tailored to the patient’s real-time motor capability. Furthermore, multiple reaching trials were administered within a structured virtual-reality-based assessment setting to obtain sufficiently representative motion data for model development [
4].
The inclusion criteria for the post-stroke cohort were as follows:
Having basic cognitive ability, capable of understanding the experimental process, and following simple instructions.
Capable of partial voluntary hand movements, such as side lifting with a small angle
Able to tolerate 1 to 5 rehabilitation exercise units, depending on the individual’s motor ability.
All post-stroke participants were right-handed stroke survivors. A therapist assessed each patient’s motor performance during the dynamic reaching exercise and provided a ground-truth score, which was used for correlation analysis in the subsequent evaluation approaches. In addition, three healthy right-handed participants were recruited to perform the same task to collect reference skeletal-motion data. The demographic and clinical characteristics of all participants—including age, gender, and affected region—are summarized in a unified format in
Table 1.
3.2. Rehabilitation System Design
3.2.1. System Introduction
The proposed VR-based stroke rehabilitation system integrates Kinect [
33], 3D stereo glasses, a 3D projection display, a 3D graphics card, and related hardware modules to deliver upper-limb rehabilitation exercises for stroke patients. Using the Unity 3D engine, the authors implement a dynamic reaching task that targets upper-limb extension, postural balance during reaching, and hand–eye coordination. To enhance sensing fidelity and system deployability, the authors incorporate an Internet of Things (IoT) architecture (
Figure 1) with wearable inertial measurement units (IMUs) that include tri-axial accelerometers and gyroscopes. The IMUs are worn on the wrist and upper arm to acquire multimodal motion signals (e.g., acceleration and angular velocity) at 100 Hz, complementing Kinect measurements by providing higher-resolution characterization of subtle movement components such as tremor. In addition, the system integrates a rehabilitation robot equipped with a force-feedback robotic arm that guides reaching trajectories and delivers adaptive resistance according to the patient’s motor capability, thereby supporting consistent and correct task execution. Motion streams from the Kinect and IMUs are transmitted via Wi-Fi to an Amazon Web Services (AWS) cloud platform for near real-time processing and storage, enabling remote monitoring and longitudinal review by therapists.
3.2.2. Task Content
The user interface of the dynamic reaching exercise is illustrated in
Figure 2, and the corresponding schematic is provided in
Figure 3. The task is designed as a bilateral ball-throwing-and-catching scenario, in which the participant uses both arms to repeatedly intercept a ball that follows a parabolic trajectory. During each trial, stroke patients are instructed to extend both upper limbs to catch successive balls, thereby training coordinated reaching performance. The total number of target catches is configurable by the therapist to match the patient’s rehabilitation stage and tolerance. To provide transparent feedback on task execution, the VR display reports key outcome statistics in real time, including the number of successful catches, the number of failed attempts, and the current streak of consecutive successful catches.
To support safe and consistent movement execution, the rehabilitation robot delivers haptic feedback during catching events and adapts the assistance/resistance profile according to the patient’s Fugl–Meyer Assessment Upper Extremity (FMA-UE) score, enabling individualized guidance. In addition, real-time performance metrics are computed on AWS Lambda and rendered on the VR interface as interactive bar charts, allowing patients to track immediate progress and facilitating sustained task engagement.
3.2.3. Difficulty Design Mechanism
The difficulty of the dynamic reaching exercise is configurable through the following parameters.
Horizontal falling distance of the sphere: The horizontal displacement of the falling sphere can be adjusted according to the patient’s upper-limb extension capability (i.e., reachable workspace). To accommodate different functional levels in the bilateral VR task, the system provides multiple reach-range settings, including 30–50%, 30–75%, and 30–100% of the target range.
Required number of successful catches: Therapists can specify the target count of successful catches to regulate training duration and endurance demand, thereby assessing how long the patient can sustain continuous practice.
The speed of the sphere falling: Speed is in units of gravitational acceleration (G). Therapists could set this to so that patients could train their reaction time and hand-eye coordination according to their impairment level.
3.2.4. Experimental Rehabilitation Platform
The experimental rehabilitation platform used in this study consisted of an integrated station comprising a display monitor, a webcam, and a host computer mounted on a dedicated support frame. Within the proposed cloud–robot–wearable framework, this station functioned as the interactive rehabilitation node for task presentation, participant supervision, and local system execution during the bilateral reaching assessment. The monitor was used to present the rehabilitation interface and task-related feedback to the participant, while the webcam enabled real-time visual monitoring throughout the experimental session. The host computer, positioned on the lower shelf of the platform, managed the execution of the rehabilitation software, local data acquisition, and synchronization with the sensing modules adopted in the proposed framework. This compact configuration provided a stable and reproducible setup for administering bilateral upper-limb assessment tasks in a controlled indoor environment.
3.3. Analysis Approach
To develop an assessment method for the dynamic reaching exercise that accounts for its bilateral nature, this study proposes a discrete movement score-based multi-class classifier that categorizes each participant as a healthy subject, a post-stroke subject with left-side impairment, or a post-stroke subject with right-side impairment. The classifier is constructed using smoothness-related features, as smoothness is a key indicator of motor control during continuous bilateral catching movements.
Prior to feature extraction, rigorous preprocessing operations were applied to the raw sensor data. Given the inherent difference in the sampling rates of the Kinect (approximately 50 Hz) and the IMU sensors (100 Hz), a resampling operation was strictly required to achieve accurate data synchronization. The raw IMU signals underwent noise-reduction filtering, normalization, and segmentation. Subsequently, all multimodal data streams were resampled and temporally aligned to ensure that the kinematic metrics extracted from different hardware sources were synchronized and unaffected by phase mismatches [
24].
After identifying the affected side, the proposed pipeline further quantifies movement performance to support therapist evaluation and to provide feedback that may facilitate patient engagement during home-based training. Specifically, this study adopts the modeling strategy in [
46] to compare patient motion data with healthy-subject reference data collected in the dynamic reaching task, thereby producing an objective measure of movement quality. To enhance interpretability, the authors additionally apply the scoring function proposed in [
29], converting the model output into a normalized performance score that can be readily understood by clinicians and patients. The overall pipeline of the analysis approach is summarized in
Figure 4.
It is important to note that although the participant cohort is small (N = 8), each participant performed multiple continuous dynamic reaching cycles across their prescribed rehabilitation units. Consequently, the feature matrices used for training the machine learning classifiers and the GMM were extracted at the segment level (i.e., individual reaching cycles) rather than being aggregated at the subject level. This approach substantially expanded the effective dataset size—yielding hundreds of independent kinematic samples—which provided sufficient data volume for model optimization and mitigated the risk of overfitting.
3.3.1. Feature Extraction
The feature set in this study is designed primarily based on three prior works [
22,
47,
48]. Lee et al. [
22] introduced a comprehensive set of kinematic descriptors to characterize three movement-performance components. In this study, the focus is placed on the smoothness-related component, and the corresponding smoothness-based kinematic features are adopted for subsequent analysis.
j specifies a joint in the set
J extracted from the Kinect joint data.
J ∈ {left wrist (lw), right wrist (rw)}.
c denotes a coordinate of movement joints in the set C ∈ {x, y, z}.
t denotes the frame index.
T denotes the total number of frames.
F denotes the sampling frequency.
The following formulas are basic smoothness-based features:
The following formulas define the normalized speed and normalized jerk:
The following formula, the Mean Arrest Period Ratio (MAPR), indicates the proportion of frames when the speed exceeds a target percentage (10%) of the maximum speed. Ref. [
22] expects that patients would make more unnecessary movements and so attain higher MAPR values.
The following formula, which represents the zero-crossing ratio, represents the period of a motion when the sign of acceleration or jerk changes. If a participant has more trembling movements or more unnecessary movements, they would attain a higher zero-crossing ratio.
Archambault et al. [
47] proposed a feature called the index of curvature (
IC), which estimates the straightness. The formula is as follows:
Balasubramanian et al. [
4] evaluate the spectral arc-length metric that uses the Fourier magnitude spectrum of the movement speed profile to assess movement smoothness. Consider a movement with speed profile
v(
t),
t ∈ [0,
T] and duration
T. The formula is as shown below:
where
V(
ω) is the Fourier magnitude spectrum of
v(
t), and [0,
ωc] is the frequency band occupied by the given movement.
ωc = 40
π rad/s (which corresponds to 20 Hz) covers the normal and abnormal aspects of human movements such as tremor.
3.3.2. Feature Selection
To identify an effective feature subset for the proposed analysis approach, a one-way ANOVA is applied for feature selection. The statistical significance threshold is set to 0.05, and the highly significant threshold is set to 0.01.
3.3.3. Discrete Score Movement-Based Multi-Classifier
To classify participants into three categories—healthy, post-stroke with left-side impairment, and post-stroke with right-side impairment—a set of discrete movement score-based multi-classifiers are tested using smoothness-related features. Conventional multi-class machine learning models are first evaluated, including Decision Tree, Random Forest, and SVM. For the SVM, a linear kernel is adopted with the penalty parameter set to C = 1.0.
Deep learning-based multi-classification models are also investigated, including a feed-forward neural network and an LSTM network. For the feed-forward neural network, the hyperparameters are set as follows: epochs = 10, batch size = 5, and learning rate = 0.005. The network contains five hidden layers with 16, 32, 64, 32, and 16 units, respectively. A softmax output layer is used for multi-class prediction, and the model is trained using binary cross-entropy as the loss function.
Because participant motion signals are inherently sequential, the authors further employ a recurrent architecture (LSTM) to capture temporal dependencies and extract latent states from time-series movement data. For the LSTM model, the hyperparameters are set to epochs = 40, batch size = 3, and learning rate = 0.0001. The LSTM backbone consists of three recurrent layers with 128, 256, and 512 hidden units, followed by three fully connected feed-forward layers with 128, 256, and 512 units. The LSTM network uses ReLU as the activation function and binary cross-entropy as the loss function. The LSTM architecture is illustrated in
Figure 5.
All multi-classifiers are trained using the same feature set. For the sequential model (LSTM), features are organized as a time-by-feature matrix spanning the full set of timestamps, where each feature corresponds to a vector over time. For non-sequential models, a feature vector is constructed using values at the final timestamp, with each feature represented as a scalar.
3.3.4. Template-Based Assessment Approach
After participant classification, each patient’s movement quality is further quantified by comparing patient trajectories with those of healthy subjects. Specifically, a performance metric based on the GMM log-likelihood proposed in [
30] is adopted. Probabilistic modeling is well suited for rehabilitation motion analysis because it can represent the inherent variability and stochasticity in human movement patterns. The overall workflow of the proposed quality assessment is illustrated in
Figure 6.
A GMM is a probabilistic mixture model composed of multiple Gaussian probability density functions [
30]. Owing to its flexibility in capturing multimodal distributions, GMM-based modeling has been widely applied to represent movement data in rehabilitation exercises [
22]. The architecture of the adopted GMM and the subsequent scoring function are illustrated in
Figure 7. For a GMM consisting of
C Gaussian components, the corresponding probability density function is given by the following equations.
where
xf represents a healthy subject’s feature data, and
λ = {
πc,
uc, Σ
c} are the mixing coefficient, mean, and covariance of the Gaussian component. Therefore, the negative log-likelihood is used as a performance metric. The log-likelihood formula is given by
where
Yf represents the patient’s feature data, and MMM denotes the number of features. To compare movement quality in specific body regions between healthy subjects and patients, two GMMs were established for different sides of movement. One GMM was established by using healthy subjects’ left-side movement data, and the other GMM was established by using healthy subjects’ right-side movement data. The number of Gaussian components was set to three in each GMM.
3.3.5. Scoring Function
To make the GMM log-likelihood-based performance metric interpretable for both therapists and patients, the authors adopt the scoring function proposed by Liao et al. [
23] to transform log-likelihood values into a normalized score within the range 0–1. Let
denote the sequence of performance-metric values computed from healthy-subject movements, and let
denote the corresponding sequence computed from patient movements, where
is the total number of healthy-subject movement samples and
is the total number of patient movement samples. The scoring function is defined by the following equations.
where
,
is the standard deviation of
x.
Compared with the formulation in [
27], we replace
with the mean value
in the scoring function when computing patient scores. This modification is adopted because the healthy-subject cohort and the patient cohort differ in both sample size and participant identity (i.e., they are not paired observations). In addition, the proposed scoring scheme is defined conditioned on the affected side. Specifically, the scoring function for left-affected patients uses
computed from the healthy subjects’ left-side performance metrics, while that for right-affected patients uses
computed from the healthy subjects’ right-side performance metrics. Therefore, patients are always normalized against the corresponding side-specific
associated with their impairment category. A higher score indicates that the patient’s movement quality is closer to the healthy-subject reference.
3.4. Amazon Web Services
The game platform is built on Amazon Web Services (AWS), leveraging its scalability, security, and cloud computing power to handle game data processing, analysis, and storage efficiently (
Figure 8). To ensure seamless data flow, game data is directly transmitted to Amazon S3, which triggers an AWS Lambda function once the data is uploaded. This function executes computational tasks such as data transformation, performance analysis, and predictive modeling.
Once the data is processed, the refined results are stored in relational databases like Amazon RDS and DynamoDB, depending on the type of data. For images, which cannot be stored in relational databases, they are kept in Amazon S3, with URLs stored in the database to enable retrieval by EC2 Web Services. EC2 retrieves the processed data and presents it through a web-based platform, allowing professionals such as therapists and clinicians to analyze player performance metrics, response times, and cognitive indicators.
The platform utilizes Amazon CloudWatch to continuously monitor the performance of the system, logging key metrics and triggering alerts to ensure optimal operation. With auto-scaling mechanisms in place, the platform can dynamically adjust resource allocation based on demand, ensuring efficiency and cost-effectiveness. This AWS-powered architecture guarantees a secure, real-time, and scalable system, enabling professionals to make data-driven decisions, generate detailed reports, and enhance therapeutic or analytical outcomes through precise game-based assessments.
3.5. Statistical Analysis
All statistical evaluations were conducted to rigorously identify significant kinematic features and validate the quality assessment models. Prior to employing a one-way ANOVA for feature selection, a Shapiro–Wilk test was conducted to evaluate the normality of the data distributions, which is a fundamental prerequisite for parametric testing, especially given the restricted sample size [
49]. Following the confirmation of normal distributions across the critical feature sets, the one-way ANOVA was utilized. While some deep learning pipelines rely solely on removing highly correlated features, ANOVA was specifically selected in this study to explicitly quantify the variance between the healthy and affected sides, providing a statistically interpretable and clinically meaningful baseline for the multi-classifier. Additionally, Pearson’s correlation analysis was employed to evaluate the linear relationship between the GMM-derived quality scores and the therapist-defined gold reference standard. Statistical significance was set at a threshold of
p < 0.05.
5. Discussion
This study proposed an integrated home-based rehabilitation assessment framework for post-stroke upper-limb training. The framework first identifies the affected side by using a deep learning multi-class classifier trained on smoothness-related kinematic features, and then performs side-specific movement quality evaluation through Gaussian Mixture Model (GMM)-based scoring. By combining Kinect skeletal data, wearable IMU signals, and rehabilitation-robot force-feedback information within an AWS-enabled AIoT architecture, the proposed system supports both affected-side discrimination and continuous motion-quality quantification. The results indicate that the proposed framework can effectively distinguish left-affected, right-affected, and healthy participants, while the side-specific scoring mechanism shows promising agreement with therapist-defined reference standards, particularly for right-affected patients. These findings support the feasibility of the proposed framework as an interpretable tool for remote and home-based stroke rehabilitation assessment.
According to
Table 2, the significant features extracted from the left-side and right-side skeletal data share the same selected feature combination. This consistency allows us to use a unified feature subset to train the multi-classifier for identifying whether a patient’s affected side is left or right. The same feature subset is also used to establish two side-specific GMMs for subsequent quality assessment. In addition, incorporating AIoT-derived IMU signals (processed in near real time via AWS IoT Core) enriches the feature space with higher-resolution dynamics (e.g., angular velocity) and is associated with an approximate 5% increase in F1-score. Rehabilitation-robot force-feedback signals integrated through AWS EC2 further complement kinematic features by capturing corrective interaction patterns, thereby improving neural network classification accuracy.
Based on
Table 3, the neural network achieves the highest F1-score; however, the conventional machine learning models underperform relative to the deep learning approach in this setting. Although the LSTM explicitly models temporal dependencies, its F1-score is approximately 10% lower than that of the neural network. A plausible explanation is that the adopted smoothness-based descriptors function primarily as summary statistics rather than sequence representations, and therefore do not benefit substantially from sequential modeling; a similar observation is reported in [
22]. Notably, robot-assisted adaptive guidance informed by AIoT inputs may help stabilize motion execution, which can partially mitigate variability that would otherwise complicate sequence-based classification.
Regarding
Table 4, combining features adopted from all three prior studies yields the highest neural network F1-score, suggesting that a more diverse feature representation improves characterization of movement behavior. This result implies that incorporating additional feature designs from the literature may further strengthen multi-classifier performance.
According to
Table 6, the correlation between the therapist-defined gold reference standard and the proposed quality assessment scores is supported for patients with right-side impairment. Robot-based real-time corrections, which help minimize trajectory deviations during task execution, and AIoT-enabled dynamic score updates visualized via AWS Amplify likely contribute to this outcome by stabilizing movement patterns and providing immediate, intuitive feedback. These findings suggest that, for right-affected patients, the dynamic reaching task can serve as a viable home-based rehabilitation exercise with interpretable quality feedback that approximates clinical evaluation utility. However, the corresponding correlation is not confirmed for left-affected patients. Given that the cohorts consist of only 2 left-affected and 3 right-affected patients, the sample sizes are comparably small; thus, sample size alone cannot fully explain this discrepancy. Instead, this divergence is likely driven by two key clinical and biomechanical factors. First, there may be greater heterogeneity in the impairment severity and recovery stages among the left-affected patients. With such a restricted sample, extreme inter-subject variability can easily obscure statistical correlations. Second, the dynamic reaching task—which involves rapid target interception—may be inherently more sensitive to assessing the dominant side. Because all participants in this study (both healthy and post-stroke) were right-handed, the kinematic baselines established by the healthy right hands reflect dominant-limb motor control, which naturally exhibits superior coordination and responsiveness. Consequently, assessing the non-dominant (left) side using a highly dynamic task might introduce baseline variance related to natural non-dominance rather than strictly stroke-induced impairment. Future work must recruit a balanced cohort of left- and right-handed individuals and develop handedness-adjusted scoring models to decouple stroke impairment from natural limb dominance.
Limitations
However, several critical limitations of the proposed framework must be acknowledged, despite its demonstrated technical feasibility in side-specific modeling and IoT-integrated robotic assessment. First, the sample size of this pilot study is extremely limited (N = 5 patients; N = 3 healthy controls.) Consequently, the statistical power is constrained, and the generalizability of the clinical conclusions should be interpreted with caution. Second, there is a substantial age discrepancy between the young healthy control group and the older stroke patient cohort. Because advancing age independently influences motor kinematics—such as decreasing movement speed and smoothness—the current healthy reference templates may overestimate the degree of stroke-induced impairment. The present findings primarily validate the computational pipeline and system architecture. Future full-scale clinical trials must recruit larger, age-matched healthy control groups to establish rigorous and unbiased normative baselines for the GMM scoring function.
Regarding the reliability of the deep learning models and GMM, it is acknowledged that training complex architectures on small clinical cohorts poses a risk of limited generalizability. To address this within the present experimental design, segment-level kinematic data from high-frequency IoT sensor streams were utilized, effectively multiplying the training instances. Furthermore, the GMM was specifically adopted for the quality assessment stage because probability density functions are relatively robust in modeling multi-modal variability even when the data scale is moderate. Nevertheless, it is emphasized that the current machine learning models are presented to validate the computational feasibility of the proposed AIoT pipeline. Expanding the dataset through multi-center trials remains a critical future step to guarantee the generalizability of the trained network weights.