Automatic Post-Stroke Severity Assessment Using Novel Unsupervised Consensus Learning for Wearable and Camera-Based Sensor Datasets

Stroke survivors often suffer from movement impairments that significantly affect their daily activities. The advancements in sensor technology and IoT have provided opportunities to automate the assessment and rehabilitation process for stroke survivors. This paper aims to provide a smart post-stroke severity assessment using AI-driven models. With the absence of labelled data and expert assessment, there is a research gap in providing virtual assessment, especially for unlabeled data. Inspired by the advances in consensus learning, in this paper, we propose a consensus clustering algorithm, PSA-NMF, that combines various clusterings into one united clustering, i.e., cluster consensus, to produce more stable and robust results compared to individual clustering. This paper is the first to investigate severity level using unsupervised learning and trunk displacement features in the frequency domain for post-stroke smart assessment. Two different methods of data collection from the U-limb datasets—the camera-based method (Vicon) and wearable sensor-based technology (Xsens)—were used. The trunk displacement method labelled each cluster based on the compensatory movements that stroke survivors employed for their daily activities. The proposed method uses the position and acceleration data in the frequency domain. Experimental results have demonstrated that the proposed clustering method that uses the post-stroke assessment approach increased the evaluation metrics such as accuracy and F-score. These findings can lead to a more effective and automated stroke rehabilitation process that is suitable for clinical settings, thus improving the quality of life for stroke survivors.


Introduction
Stroke is the third major cause of disability. As reported by disability-adjusted life years lost [DALYs], 143 million cases of disability worldwide in 2019 were due to stroke. Nearly 60% of post-stroke patients with upper-limb hemiparesis in the severe stage experience chronic major functional impairment [1]. Rehabilitation is highly recommended for stroke survivors as one of the most effective treatments to accelerate motor functions in the affected parts of their bodies [2,3]. The first step in a rehabilitation process is the assessment of the affected body part for rehabilitation planning and strategy. Traditionally, such assessments are self-report-based or conducted according to an expert decision. The Fugl-Meyer Assessment (FMA) is one such assessment in which clinicians measure sensorimotor impairment and functional movements of the body associated with the range of motion, muscles, and joints. It also measures levels of severity in stroke survivors. FMA-UE refers to the FMA score for the Upper Extremities, which consists of 33 tasks scored between 0 and 2 points [2]. If a patient performs the required task fully or partially, or is unable to perform the task, the assigned point value will be 2, 1, or 0, respectively. The sum of the points from

•
The function of the affected hand in post-stroke patients (level of severity) was investigated using unsupervised learning.

•
The general movements categorized as activities of daily living, such as holding a cup and drinking, eating apples, answering the phone, etc., were utilized. • For the first time, position data in the frequency domain was used in addition to the acceleration data.

•
The novel labeling method for each cluster using trunk displacement is one of the main contributions made by this study. • In the study, the proposed method investigated not only wearable datasets but also camera-based datasets.
This paper is organized as follows. Section 2 describes the related works. Clustering analysis and consensus learning are discussed in Sections 3 and 4, respectively. The proposed assessment model using consensus-based clustering is demonstrated in Section 5. The material and methods, preprocessing, and proposed labeling method are deliberated in Section 6. The data preprocessing is shown in Section 7. The experimental results are presented in Section 8. Furthermore, a discussion of the results is offered in Section 9. Finally, the conclusion and future are deliberated in Section 10.

Wearable Sensors
The authors in [1] chose 23 stroke patients with FMA-UE scores of less than 30, which is found in severe stroke patients, and used unsupervised learning to determine the homogeneous movement, outlier movements, and all moving components. A study by [26] used the Xsens wearable system and 17 sensors to define the correlation between FMA score and body movements while patients performed daily activities. Two IMU sensors were developed by the [27] group to assess the outcomes of rehabilitation treatments. However, these two studies did not implement machine learning techniques. Additionally, Ref. [28,29] estimated the functional ability scale using accelerometer sensors attached to the arm, upper arm, and hand. The Random Forest technique used the acceleration data to predict FMA scores. In addition to the accelerometer, flex sensors were attached to the body by the [21] study to monitor patient movements. The Extreme Learning Machine (ELM) was applied to predict FMA scores [21]. Researchers employed a rule-based classification [37] to estimate each patient's FMA score using accelerometer sensors affixed to the upper arm and forearm. The summary of the work conducted using wearable sensors is presented in Table 1. In the literature, the assessment type has been investigated based on three categories: clinical emulation [1,[19][20][21][22][23]30,[38][39][40], movement classification [3,17,18,25,[41][42][43][50][51][52][53][54], and activity recognition [15,16,24,44,48,49,[55][56][57][58][59]. As this study focused on clinical emulation, Table 1 describes only the summary of this category. Several studies deployed individual accelerometers [19,20,22,37], and some studies combined IMUs with different sensors such as flex sensors [60,61]. While some studies used healthy participants [20,54,62], others used stroke patients as participants [1,22,23,[38][39][40]. Support Vector regression (SVR) was the most used classifier [21,22,40]; this was primarily used in regression problems for clinical assessments where participants were given clinical scores [1,[19][20][21][22]30,37,38,40]. A few researchers also employed the Random Forest (RF) technique [19,20,30]. In a few studies, the unsupervised machine learning method utilized the k-means [32] and DBScan [1] methods. The work in [32] investigated the Functional Ability Scale (FAS) assessment test using the K-means cluster and demonstrated the clustering results correlation to movement quality examine by the FAS. The authors in [45] used a hierarchical cluster to define the level of severity based on FMA scores. No study directly investigated the severity levels of strokes using unsupervised AI-driven models without available labeling.

Camera-Based Sensors
The summary of studies developed using the camera-based sensors is described in Table 2. The Kinect camera is one of the cameras used in motion capture research [63,64]. The Kinect camera is also combined with the Myo armband [9,65], force-sensing and resistorsensing [5,35], pressure sensors or gloves sensors [34]. Different supervised machine learning techniques have been used, such as Artificial Neural Networks [33], SVM [34,66], and rule-based classification techniques [5,66]. The Vicon camera (Vicon Motion System Ltd., Oxford, UK) is also used to capture human motion, and the kinematics data can be derived using the Nexus software. However, no study has investigated severity levels of strokes using unsupervised learning and trunk displacement features in the frequency domain in post-stroke smart assessment. In this study, the compensatory movements or trunk displacements were used to label each cluster and then compared with the ground truth FMA scores achieved by clinician experts. The clustering of data using two different datasets (camera and wearable) in the frequency domain using position and linear acceleration features is an additional novelty in, and contribution made by, this paper.

Clustering Analysis
Different categories of clustering methods are based on their properties: for example, hard, soft, distance-based, and density-based clustering. Eight baseline clustering methods are employed: the Fuzzy C-means, K-means, the Self-Organizing Map (SOM), Gaussian Mixture Models, DBScan, and hierarchical, spectral, and OPTICS clustering.

K-Means Clustering
The K-means method [67] aims to minimize the sum of squared distances between each data point and the centroids of the assigned cluster, as defined in Equation (1).
Here, k is the number of data points, N is the number of clusters, C ij is the (i,j)-th element of matrix C (it equals 0 if a data point does not belong to cluster j, and 1 otherwise), µ i is the centroid of the j th cluster (i.e., it is the average of all data points assigned to cluster j), and X i − µ j 2 is the Euclidean distance between two points.

Fuzzy C-Means Clustering
The cluster center "c" is randomly selected and the probability of cluster membership for an i th data point to the j th cluster is calculated thus [67]: m is the fuzziness parameter or factor, c denotes the cluster number, and d ij indicates the Euclidean distance between the j th cluster center and i th data point. µ ij represents the degree of membership of the i th data to the j th cluster center. After allocating data points to the j th cluster, the center of each cluster is defined thus: This iteration continues until the following equation is minimal:

SOM Clustering
Let X = {x1, x2, . . . , xn} be the set of input vectors and W = {w1, w2, . . . , wm} be the set of weight vectors for the nodes in the grid. The SOM clustering tries to minimize the objective function J(W) (Equation (5)) and update the weights of the nodes in the grid (Equation (6)): Here, t is the iteration number, η(t) is the learning rate at iteration t, and h(c i , t) is the neighbourhood function that determines the influence of the input vector x i on the weight vector w i based on the distance between c i and the winning node, and c i is the index of the winning node in the grid for the input vector x i [68][69][70].

Hierarchical Clustering
In hierarchical clustering, the relationship between each cluster is based on hierarchy, and a dendrogram is the output of the included clusters. Initially, this process considers the full dataset as a cluster and then the hierarchical levelling is developed based on a distance matrix. The distance between each pair of data points is calculated and then the closest pair is selected and merged. This process is repeated iteratively until all clusters have been merged/split. The objective of optimization in hierarchical clustering is to find the optimal hierarchy of nested clusters that best represents the underlying structure of the data. The quality of the clustering solution can be evaluated using external or internal validation metrics such as the adjusted Rand index, the silhouette score, or the cophenetic correlation coefficient [71].

Spectral Clustering:
Spectral clustering is a graph partitioning problem that identifies node neighbourhoods and the edges connecting them based on graph theory. This method involves transforming the data into a new representation using the eigenvalues and eigenvectors of a matrix derived from the data. The process begins by constructing a similarity graph or matrix that captures the pairwise similarity or dissimilarity among each pair of data points. The graph can be constructed in different ways depending on the problem. Still, the most common approach is to use a Gaussian kernel function to measure the similarity between two data points based on their distance. Once the similarity matrix is constructed, the eigenvectors and eigenvalues of the matrix are computed using techniques from linear algebra. The k eigenvectors corresponding to the k smallest eigenvalues are then used to represent the data in a lower-dimensional space. This is conducted by treating the eigenvectors as new coordinates for the data points. Spectral clustering can be computationally expensive and requires the choice of several parameters such as the number of clusters and the similarity measure [72].

Gaussian Mixture Models Clustering:
Gaussian Mixture Models (GMM) clustering is a soft clustering method based on probability density assessments that applies the Expectation-Maximization algorithm and creates ellipsoid-shaped clusters. A GMM is composed of several Gaussian distributions and has mean and center. In addition, the covariance is also defined for the GMM clustering method. The mixing probability, which defines the Gaussian function size, is also determined for GMM clustering, and this enhances the ability of this method to deliver a numerical quantity of capability per total of clusters when compared to hard clustering methods such as the K-means method. Given a dataset X = { x 1 , x 2 , ..., x N } and a GMM model with K Gaussian components, the goal is to find the optimal values of the parameters θ = {w 1 , ..., w K , µ 1 , ..., µ K , Σ 1 , ..., Σ K }, where w i is the weight of the i-th component, µ i is its mean vector, and Σ i is its covariance matrix. The optimization problem aims to maximize the log-likelihood of the observed data as depicted below: Here, N(x i |µ k , ∑ k ) is the Gaussian probability density function with mean µ k and covariance matrix ∑ k evaluated at data point x i . The Expectation-Maximization (EM) algorithm is used to find the optimal values of these parameters. The algorithm alternates between the E-step and the M-step. In the E-step, the posterior probabilities γ ik of each data point x i belonging to each component k are computed thus: Here, ∑ 1 w 1 N(x i |µ 1 , ∑ 1 ) is the total probability of data point i across all components.
In the M-step, the parameters are updated as follows: where N is the total number of data points. The EM algorithm iteratively updates the parameters until convergence is reached. The final set of parameters represents the optimal solution to the GMM clustering problem [73,74].

DBScan Clustering:
The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm does not require specifying the number of clusters in advance, and can handle clusters of arbitrary shape. DBSCAN defines two parameters: the minimum number of points (MinPts) required to form a dense region and a distance threshold (eps) that determines the size of the neighborhood around each data point. Let D be the dataset of n data points {x 1 , x 2 , ..., x n } and let eps and MinPts be the distance threshold and minimum number of points, respectively. The DBSCAN algorithm optimizes the clustering by finding the following sets of points:

1.
Core points: A point x in D is a core point if it has at least MinPts points in its eps-neighborhood, including itself.

2.
Border points: A point y in D is a border point if it is not a core point but has at least one core point within its eps-neighborhood.

3.
Noise points: A point z in D is a noise point if it is neither a core nor a border point.
The DBSCAN algorithm groups the core points and their border points into clusters. Two core points belong to the same cluster if they are directly or indirectly reachable from each other through a series of core points. DBSCAN tries to maximize the number of points assigned to a cluster while minimizing the number of noise points. DBSCAN adjusts the parameters eps and MinPts to find the optimal clustering [1,75].

OPTICS Clustering:
OPTICS, which stands for 'ordering points to identify cluster structure', is another density-based clustering method similar to DBScan. However, the reachability distance plot in OPTICS is an advancement over that in DBScan. The reachability distance between p and q is the smallest distance from p if p is the core object. OPTICS uses a priority queue data structure to efficiently order the points based on their densities and avoid the computation of pairwise distances between all points, which can be computationally expensive. Instead, the priority queue allows the algorithm to consider only the points relevant to the current point being processed. Another optimization used in OPTICS clustering is a data structure called a core distance tree, which stores the core distances of all points in the dataset. The core distance of a point is the minimum distance at which a point can be considered part of a dense region. The core distance tree efficiently computes the reachability distances between points in the dataset [76].

The Consensus Solvers
Consensus clustering combines multiple clusterings of the same data set into a singleconsensus clustering solution to obtain a more stable and robust clustering solution that captures the common structure across different clusterings.

Meta-Clustering Algorithm (MCLA) Consensus Solver
The Meta-Clustering Algorithm (MCLA) is centered around clustering clusters and provides confidence estimates for object membership within clusters. The performance of MCLA depends on the choice of clustering algorithms, the combination method, and the quality of the individual clusterings. Its first goal is to transform the provided cluster Sensors 2023, 23, 5513 9 of 27 label vectors into a hypergraph representation appropriate for subsequent analysis. Any set of clusterings can be mapped to a hypergraph composed of vertices and hyperedges. The hyperedge is a matrix of binary membership indicators. Each column represents a cluster, and the rows corresponding to objects with unknown labels are populated with zeros in the indicator matrix while 1 denotes an object with a known label. As a result, each cluster can be mapped to a hyperedge, and the set of clusterings can be represented as a hypergraph. For the MCLA, clusters are represented as hyperedges. The MCLA aims to group and merge related hyperedges, assigning objects to the resulting collapsed hyperedge based on their strongest participation. A graph-based clustering approach achieves the determination of related hyperedges for collapsing.

HyperGraph Partitioning Algorithm (HGPA) Consensus Solver
The HyperGraph Partitioning Algorithm (HGPA) directly repartitions the data by leveraging the existing clusters as indicators of strong associations. In addition, it can be expressed as partitioning a hypergraph by removing the smallest number of hyperedges. In the HGPA, the objective is to partition the hypergraph into a set of l disjoint subgraphs (called partitions) such that some criterion is optimized. The HGPA works by constructing a weighted hypergraph where each vertex represents a hyperedge and the weight of each vertex is proportional to the cost of assigning it to a specific partition. [77].

Cluster-based Similarity Partitioning Algorithm (CSPA) Consensus Solver
The CSPA partitions a given dataset into distinct clusters by analyzing the similarities between the objects in the dataset. The algorithm operates by building a similarity matrix that evaluates the similarities between pairs of objects in the dataset. The choice of similarity measure can be any distance metric or appropriate similarity measure for the data. After constructing the similarity matrix, the algorithm initially groups clusters by clustering objects that display high pairwise similarities. Subsequently, this initial set of clusters is improved using a hierarchical clustering algorithm that iteratively combines pairs of clusters with high similarities until the desired number of clusters is obtained. When merging the clusters, the algorithm leverages a cluster similarity measure to determine which pairs of clusters should be combined in situations where the weights are proportionate to the sizes of the clusters [77].

Hybrid Bipartite Graph Formulation (HBGF) Consensus Solver
The Hybrid Bipartite Graph Formulation (HBGF) creates a bipartite graph of vertices and edges. An edge will only connect an instance vertex to a cluster vertex if that instance belongs to that cluster. New cluster vertices are added if a fresh clustering is incorporated into the ensemble, with these vertices linked to the instances they contain. Round vertices and clusters represent the instances illustrated by diamond-shaped vertices. The graph's edges each possess a single weight, with any edges possessing zero weights being excluded. The bipartite graph is partitioned into non-overlapping clusters using a partitioning algorithm such as spectral clustering or the k-means method. The goal of the partitioning algorithm is to minimize the number of cut edges between clusters while balancing the weights of the vertices [78]. Figure 1 presents the HBGF consensus clustering using two individual clusterings: K-means and Fuzzy C-means clustering. edges each possess a single weight, with any edges possessing zero weights being excluded. The bipartite graph is partitioned into non-overlapping clusters using a partitioning algorithm such as spectral clustering or the k-means method. The goal of the partitioning algorithm is to minimize the number of cut edges between clusters while balancing the weights of the vertices [78]. Figure 1 presents the HBGF consensus clustering using two individual clusterings: K-means and Fuzzy C-means clustering.

The Proposed Post-Stroke Severity Assessment Model using Modified NMF-Consensus Solver (PSA-MNMF)
The proposed consensus clustering algorithm, PSA-NMF, combines various clusterings into one united clustering, i.e., creating cluster consensus, to produce more stable and robust results compared to any individual clustering method. We developed a modified nonnegative matrix factorization [79] (MNMF) as an enhanced consensus solver by factorizing the consensus matrix into two nonnegative matrices that represent the underlying structure of the data. In our algorithm, once the MNMF phase is completed, an exhaustive search is executed to find the best optimal combinations of the consensus toward robust results. Each of these methodologies is described next.

The Modified Nonnegative Matric Factorization (MNMF) Consensus Solver
Assume A defines the data point A = {a1,…, an}, which contains n data points. Then, the partitions B of data point A is B = {b 1 , b 2 , …., b E }, E = 1,…., E, and the set of clustering in each partition is Dt = {d1 f , d2 f ,…. dm f }. m denotes the number of clusters for the partition B T . The number of clusters m can differ for each cluster and = ⋃ . Now, the distance between two partitions B 1 and B 2 can be defined based on the MNMF solver as follows: For each element, where i and j are members of different clusters in partition b 1 , denoted thus: ( , ) ∉ ( ). Then, one approach can be to define the connectivity matrix as follows:

The Proposed Post-Stroke Severity Assessment Model using Modified NMF-Consensus Solver (PSA-MNMF)
The proposed consensus clustering algorithm, PSA-NMF, combines various clusterings into one united clustering, i.e., creating cluster consensus, to produce more stable and robust results compared to any individual clustering method. We developed a modified nonnegative matrix factorization [79] (MNMF) as an enhanced consensus solver by factorizing the consensus matrix into two nonnegative matrices that represent the underlying structure of the data. In our algorithm, once the MNMF phase is completed, an exhaustive search is executed to find the best optimal combinations of the consensus toward robust results. Each of these methodologies is described next.

The Modified Nonnegative Matric Factorization (MNMF) Consensus Solver
Assume A defines the data point A = {a 1 , . . . , a n }, which contains n data points. Then, the partitions B of data point A is B = {b 1 , b 2 , . . . ., b E }, E = 1, . . . ., E, and the set of clustering in each partition is D t = {d 1 f , d 2 f , . . . . d m f }. m denotes the number of clusters for the partition B T . The number of clusters m can differ for each cluster and A = ∪ m h=1 D t h . Now, the distance between two partitions B 1 and B 2 can be defined based on the MNMF solver as follows: For each element, where i and j are members of different clusters in partition b 1 , denoted thus: (i, j) / ∈ D k B 1 . Then, one approach can be to define the connectivity matrix as follows: The f ij (b 1 ,b 2 ) values are defined according to the connectivity matrix as follows: The f ij (b 1 ,b 2 ) value is 0 or 1.The consensus clustering b* can be derived thus: Assuming that the solution for the optimization equation above is U ik = CM ik (B*), the average consensus association between i and k is as follows: The average squared difference from the consensus association is as follows: The smaller ∆CM 2 is, the closer to each other the partitions are. The ∆CM 2 is constant, and therefore, Then, the optimization problem for consensus clustering is expressed as follows: This norm is called the Frobenius norm. This equation indicates that the consensus association clustering used is consensus clustering. Now, the next task is to define an NMF clustering solver. The nonnegative matrix factorization (NMF) solver uses nonnegative data (A) factorized to two nonnegative matrices like P and Q (A = PQ). For the NMF formulation, the clustering indicator of a matrix N = {0,1} nxk can be indicated as a clustering solution, and by definition the constraint for this matrix is that in each row, only one "1" can exist and the rest of the elements must contain zeros.
If i and k belong to the same cluster since the dot product is the multiplication of i by k, the results will equal 1 (NN T ) ik = 1; otherwise, (NN T ) ik = 0. With the constraints of U = NN T , the consensus clustering is optimized as follows: Let us assume that (N T N) jl = 0 when j is not equal to l and i and j do not belong to each other. Additionally, we must consider that (N T N) jj = |C j | = r j , where C j denotes cluster j, and that in the matrix there will be only non-zero r j values. Let L = diag(N T N) = diag (r 1 , r 2 , . . . .r k ) where N T N = L. Then, the optimization problem for this case can be written as follows: This optimization can be easier to solve compared to Equation (21). However, the cluster size should be defined. Therefore, the cluster size should be eliminated.
Then, the optimization equation is as depicted: N T and L are both acquired as solutions for this problem. This formulation demonstrates that consensus clustering can be defined as a symmetric nonnegative matrix factorization. The optimization equation of the MNMF is as follows: Given N T N=1, for each iteration, the procedure is updated as follows: In conclusion, the above explains the formulation of an algorithm of [78]. L was restricted to being a diagonal matrix, but, in this equation, Y is not restricted to being a diagonal matrix. However, at some point, it can change to being a diagonal matrix; in that case, it will remain a diagonal matrix.

Exhaustive Search
An exhaustive search phase was conducted in this study to find every potential solution in order to obtain a desirable solution. We tried to find the best performance among a combination of baseline clustering methods within the solver consensus by attempting every possible combination. The model ran 100 times to decrease the variance; the best 10 combinations were reported each time. The 10 best performances (quantified by selecting a proper assessment measure, such as F-score) were reported. The final best combination that appeared the most was selected to be reported in the results.

THE PSA-MNMF Consensus Clustering Algorithm
The input data set is provided in the PSA-MNMF consensus clustering algorithm and a set of clustering algorithms in order to generate the individual clusterings. Next, the baseline clustering algorithms are applied to the input data set to generate a set of individual clusterings. Each clustering algorithm may produce a different result due to its unique assumptions, parameters, and randomness. A consensus matrix is constructed based on the scores of similarity between the individual clustering. The consensus matrix is a square matrix representing the degree of agreement between each pair of data points in the input data set. The consensus matrix is then factorized into two nonnegative matrices, W and H, using the MNMF. The factorization can be expressed as C ≈W*H, where C is the consensus matrix, W is the cluster assignment matrix, and H is the consensus centroid matrix. The rows of the consensus centroid matrix are used as input for a chosen clustering algorithm to obtain a final consensus clustering solution. The clustering algorithm choice may differ from the individual baseline clustering algorithms. Finally, each data point is assigned to the corresponding consensus cluster. Algorithm 1 summarizes the proposed PSA-MNMF algorithm and Figure 2 describes the consensus part of the proposed method.
Step 6: The final consensus clustering solution assigns each data point in the input data set to a consensus cluster.
Step 7: The algorithm returns H and performance metrics α (including F-score, accuracy, precision, and recall) End The proposed MNMF-based consensus clustering has several advantages over traditional consensus clustering methods. The MNMF can handle high-dimensional data sets Define a matrix N ixk such that in each row only "1" can exist and the rest of the values should be zeros. Calculate the NN T . If i belongs to k, the results will equal 1, otherwise they will equal zero. Define L as L = N T N. Begin Step 1: The optimization equation for PSA − NMF is calculated using the equation: min where N T N=1 Step 2: At each iteration, the N value will be updated using this equation: N oi ← N oi Step 3: At each iteration, the L is updated: Step 5: The exhaustive method finds the best performance metric from among the top 10 recorded combinations.
Step 6: The final consensus clustering solution assigns each data point in the input data set to a consensus cluster.
Step 7: The algorithm returns H and performance metrics α (including F-score, accuracy, precision, and recall) End The proposed MNMF-based consensus clustering has several advantages over traditional consensus clustering methods. The MNMF can handle high-dimensional data sets and extract meaningful features that capture common structures across different clusterings. Moreover, the MNMF can provide a more interpretable clustering solution as it explicitly separates the cluster assignments from the consensus centroids. The PSA-MNMF is the first contribution toward post-stroke severity assessment that provides robust results using the position data and acceleration data in the frequency domain.

Data, Materials, and Methods
Generally, post-stroke motion datasets are rarely considered when choosing an opensource dataset. The U-limb datasets published in 2021 consist of 65 post-stroke and 91 healthy subjects collected in different clinical settings using the same protocol [80].
In this study, to deploy an unsupervised learning method, the data collected from stroke patients by utilizing wearable sensors and camera-based sensors were implemented from the U-limb datasets. Group research at the University of Zurich (UZH) [81] implemented 17 IMU sensor systems, collected using the Xsens suite (Awinda, Xsens Technologies B.V., Enschede, The Netherlands), for 20 stroke patients. IMUs included a 3D angular magnetometer, 3D accelerometers, and a 3D gyroscope. Table 3 describes the participants' characteristics. The FMA-UE score for the patients, which was 46.00 ± 10.16, was in the moderate and mild categories according to the study [45]. Both affected hands and non-affected hands were used for this study. The dataset selected for this research is an open-source dataset. The mean age of the participants in this study was 61.00 ± 10.69, including 5 females and 15 males. 11 right hands and 9 left hands were affected, and it was known that there was only one person with a dominant left hand. The four grasping-action activities chosen for this research are described in Table 3. The camera-based dataset was dataset1 and the sensor-based dataset was dataset2. Table 3. The activities of daily living tasks selected for this study [82].
Step 1 Step 2 Step 3 Reaching The Hannover Medical School (MHH) research group collected position data on healthy and stroke-patient participants deploying motion capture technologies. This system comprised 12 MX Vicon cameras (Vicon Motion System Ltd., Oxford, UK) operated by Version 1.8.5 of the Nexus software. There were 21 passive markers attached to the upper body (thorax, upper arm, and forearm) to capture arm movements. The number of stroke patients attending this research was 20-12 male and 6 female-and the mean age of the participants was 49.88 ± 16.92 years. The FMA-UE for this group was 17.75 ± 2.05; since it was less than 29, it was considered to fall within the severe category [45]. The study captured only the affected hand of each stroke patient. There were 20 healthy participants included in this research, and 12 of them were male. The mean average age of the healthy group was 46.77 ± 15.25 years. The dominant hand was selected for testing in healthy participants, and 2 of the participants were left-handed. Each participant repeated the four tasks three times: the same as in the sensor data. This research group was selected for our study because the same experiment and research protocol had been employed with the UZH group. The Table 4 presents the characteristics of both wearable and camerabased system.

Data Preprocessing
The camera-based position data were collected at a 200 Hz sample rate. The position time series of camera data was filtered using a low-pass second-order Butterworth filter with a cut-off frequency of 20 Hz to lessen the high frequencies that were noise elements not created by humans. The wearable sensor data were collected with a sampling frequency of 60 Hz. The low-pass second-order Butterworth filter with a cut-off frequency of 10 Hz was applied. This section separately describes the wearable sensor dataset (dataset-1) and camera-based dataset (dataset-2). Figure 3 describes the general preprocessing steps to get the position data in frequency domains for the camera system and wearable sensor datasets.
The camera-based position data were collected at a 200 Hz sample rate. Th time series of camera data was filtered using a low-pass second-order Butterw with a cut-off frequency of 20 Hz to lessen the high frequencies that were nois not created by humans. The wearable sensor data were collected with a sam quency of 60 Hz. The low-pass second-order Butterworth filter with a cut-off of 10 Hz was applied. This section separately describes the wearable sensor d taset-1) and camera-based dataset (dataset-2). Figure 3 describes the gener cessing steps to get the position data in frequency domains for the camera s wearable sensor datasets.

Wearable Sensors (Dataset 1)
The 3D positions (x, y, z) of five major upper limb parts consisting of the ha der, upper arm, forearm, and sternum (T8) were selected for this research. T features with 3D predictor variables (i.e., 15 features) were used for each side o From the relevant equation, the linear acceleration data was derived from th data. The position data and acceleration data were tested in the frequency do following formula (in Equation 30) was used according to an earlier study [83] pose 3D to 1D each time and be independent of the orientations of sensors. Then value was derived for the acceleration data. In the equation below, X, Y, and three dimensions of acceleration in each step. ℎ = √ + + Figure 3. The preprocessing procedure.

Wearable Sensors (Dataset 1)
The 3D positions (x, y, z) of five major upper limb parts consisting of the hand, shoulder, upper arm, forearm, and sternum (T8) were selected for this research. Therefore, 5 features with 3D predictor variables (i.e., 15 features) were used for each side of the body. From the relevant equation, the linear acceleration data was derived from the position data. The position data and acceleration data were tested in the frequency domain. The following formula (in Equation 30) was used according to an earlier study [83] to decompose 3D to 1D each time and be independent of the orientations of sensors. Then, the mean value was derived for the acceleration data. In the equation below, X, Y, and Z are the three dimensions of acceleration in each step.

Camera-Based Sensors (Dataset-2)
The 3D positions from 11 markers at the wrist, ulnar bone, humerus bone, scapula, and trunk were collected from camera-based datasets. Nine markers were used for feature selection with 3 dimensions (x,y,z). Therefore, 27 features were used for the camera dataset. The 4 markers on the trunk were used to define the trunk displacements. Similar to what had been done in the wearable sensor dataset, the acceleration was derived from the position data using a formula. The linear acceleration in the frequency domain and position data in the frequency domain were tested.

Trunk Displacement Measurement
Measurements were done according to an earlier study [84] using T8 from the wearable sensors and an average of 4 sensors on the trunk from the camera-based sensors. The measurements were done according to [84]. Trunk displacements were specified by differences in the position and orientation of the sensor located at the sternum [85]. The mean of the first 10 data points was subtracted from the position data in each step for all x, y, and z directions. The following equation was used to find one value for each step.
Trunk Displacements = TD x + TD y + TD Z (31) Here, TD X is the trunk displacement in the x direction (or front) at each step, and TD y and TD z are the trunk displacements in y and z, respectively. Based on the literature [85][86][87][88], trunk movements are called compensatory movements in stroke patients while performing tasks. The labelling for each cluster was assigned according to trunk displacement. For the camera-base dataset, 4 markers located at the trunk were selected, and for each marker, the displacement was calculated according to the above-mentioned method. Then, the average of these 4 markers was selected for the final trunk displacement to label each cluster.

Data Labeling
For stroke patients, extreme trunk displacement is a common motor compensation [89]. Accordingly, stroke survivors use trunk displacement as a compensatory movement for the activities of daily living [82,90]. Therefore, in a novel method for labelling, the trunk displacement was used to label each cluster. Accordingly, the more displacement was, the more severe the stroke level was. The lowest displacement average in each cluster was selected as the healthiest or mild level. The labelling method using trunk displacement is a novel one proposed in this paper. Each labelling result derived from each clustering was compared with the ground truth FMA score for each patient.
The visualization summary of step-by-step methods employed in this paper is presented in Figure 4.

Trunk Displacement Measurement
Measurements were done according to an earlier study [84] using T8 from the wearable sensors and an average of 4 sensors on the trunk from the camera-based sensors. The measurements were done according to [84]. Trunk displacements were specified by differences in the position and orientation of the sensor located at the sternum [85]. The mean of the first 10 data points was subtracted from the position data in each step for all x, y, and z directions. The following equation was used to find one value for each step.
Here, TDX is the trunk displacement in the x direction (or front) at each step, and TDy and TDz are the trunk displacements in y and z, respectively. Based on the literature [85][86][87][88], trunk movements are called compensatory movements in stroke patients while performing tasks. The labelling for each cluster was assigned according to trunk displacement. For the camera-base dataset, 4 markers located at the trunk were selected, and for each marker, the displacement was calculated according to the above-mentioned method. Then, the average of these 4 markers was selected for the final trunk displacement to label each cluster.

Data Labeling
For stroke patients, extreme trunk displacement is a common motor compensation [89]. Accordingly, stroke survivors use trunk displacement as a compensatory movement for the activities of daily living [82,90]. Therefore, in a novel method for labelling, the trunk displacement was used to label each cluster. Accordingly, the more displacement was, the more severe the stroke level was. The lowest displacement average in each cluster was selected as the healthiest or mild level. The labelling method using trunk displacement is a novel one proposed in this paper. Each labelling result derived from each clustering was compared with the ground truth FMA score for each patient.
The visualization summary of step-by-step methods employed in this paper is presented in Figure 4.

Experimental Analysis and Results
This section provides the experimental results and analysis of the proposed PSA-MNMF algorithm as well as of baseline individual and consensus methods. Eight baseline clustering methods were employed: Fuzzy C-means clustering, K-means clustering, a Self-Organizing Map (SOM), Gaussian Mixture Models, DBScan, and Hierarchical, Spectral, and OPTICS clustering. The results of the consensus MCLA solver are also reported for making comparisons to the proposed PSA-MNMF. The accuracy, recall, precision, and F-score data are also reported. In this section, we have used two datasets, the wearable sensor-based dataset-1 and camera-based dataset-2. We have investigated the clustering results using a combination of position and acceleration in frequency. In this paper, for the number of clusters k, k = 2 denotes 'severe' and 'non-severe', and k = 3 denotes 'severe,' 'mild,' and 'non-severe.'

The Averaged Normalized Mutual Information (ANMI)
If no prior information is available regarding the relative importance of the individual groupings, an appropriate objective for the consensus answer would be to identify a clus-tering that maximizes the information shared with the original clustering. Thus, to justify the choice of the NMF as a baseline solver for our proposed model, we have conducted a comparative analysis using Averaged Normalized Mutual Information (ANMI) to compare the five consensus clusterings including HGPA, MCLA, HBGF, CSPA, and NMF [77,91,92]. Mutual information, a symmetric measure that quantifies the statistical information shared between two distributions [93], serves as a reliable indicator of the information shared between a pair of clusterings. Averaged normalized mutual information measures the amount of information that two splits (clusters and class labels) share, regardless of the number of clusters. The Mutual Information score quantifies the degree to which these splits are correlated, indicating how much information can be inferred about one of them if the other is known. As the number grows higher and closer to 1, since it is normalized, the consensus clustering results show better performance. The ANMI results of the NMF, MCLA, CSPA, HGPA, and HGBF clusterings are reported in Figures 5 and 6 for dataset-1 and dataset-2, respectively. The best baseline consensus solution for dataset-1 and dataset-2 was the NMF solver. Thus, the NMF solver method was selected as the base consensus model of our proposed model, with modified objective factors, as shown in Section 5.

The Averaged Normalized Mutual Information (ANMI)
If no prior information is available regarding the relative importance of the individual groupings, an appropriate objective for the consensus answer would be to identify a clustering that maximizes the information shared with the original clustering. Thus, to justify the choice of the NMF as a baseline solver for our proposed model, we have conducted a comparative analysis using Averaged Normalized Mutual Information (ANMI) to compare the five consensus clusterings including HGPA, MCLA, HBGF, CSPA, and NMF [77,91,92]. Mutual information, a symmetric measure that quantifies the statistical information shared between two distributions [93], serves as a reliable indicator of the information shared between a pair of clusterings. Averaged normalized mutual information measures the amount of information that two splits (clusters and class labels) share, regardless of the number of clusters. The Mutual Information score quantifies the degree to which these splits are correlated, indicating how much information can be inferred about one of them if the other is known. As the number grows higher and closer to 1, since it is normalized, the consensus clustering results show better performance. The ANMI results of the NMF, MCLA, CSPA, HGPA, and HGBF clusterings are reported in Figures 5 and 6 for dataset-1 and dataset-2, respectively. The best baseline consensus solution for dataset-1 and dataset-2 was the NMF solver. Thus, the NMF solver method was selected as the base consensus model of our proposed model, with modified objective factors, as shown in Section 5.  Tables 8-10. In Tables 5 and 8, the performance results of the position data of wearable sensors in the frequency domain are presented. Tables 6 and 9 present the results of the acceleration data on the wearable sensors in the frequency domain. Tables 7 and 10 show the merged data on acceleration in frequency and position in the frequency domain of the wearable sensor data.    Table 6. Accuracy, precision, recall, and f-score-Dataset 1 (acceleration in the frequency domain for k = 2).  Table 7. Accuracy, precision, recall, and f-score-Dataset 1 (merging of position and acceleration in the frequency domain for k = 2).  As shown in Tables 5-10, using a two-level and three-level assessment, the proposed PSA-MNMF showed the best performance compared to the baseline clustering methods. Figure 7 demonstrates the comparisons of position, acceleration, and merging of position and acceleration in frequency domains between the proposed PSA-MNMF and the MCLA consensus methods for k = 2 and k = 3. We can note that the proposed PSA-MNMF outperformed the MCLA by achieving higher accuracy, precision, recall, and F-score, making it suitable for clinical settings. As shown in Tables 5-10, using a two-level and three-level assessment, the proposed PSA-MNMF showed the best performance compared to the baseline clustering methods. Figure 7 demonstrates the comparisons of position, acceleration, and merging of position and acceleration in frequency domains between the proposed PSA-MNMF and the MCLA consensus methods for k = 2 and k = 3. We can note that the proposed PSA-MNMF outperformed the MCLA by achieving higher accuracy, precision, recall, and F-score, making it suitable for clinical settings.  For the camera-based dataset-2, all 8 algorithms were implemented and compared for k = 2 and k = 3 (k is the number of clusters). Each cluster ran 100 times to reduce the variance of the results. Each time, while running, the computational time for k = 2 was about 0.33 s for each clustering run and approximately 88.08 s for the consensus clustering run, and k = 3, the computational time was about 0.416 s and approximately 73.88 s for the consensus clustering run. Tables 11-13 present the results at k = 2 on position data, acceleration data, and the merged data on acceleration in frequency and position in the frequency domain for dataset-2, respectively. As shown in Tables 11-16, using a two-level and three-level assessment, the proposed PSA-MNMF demonstrated the best performance compared to the baseline clustering methods. We can note from Figure 8 that the proposed PSA-MNMF outperformed the MCLA by achieving higher accuracy, precision, recall, and F-score, thus making it suitable for clinical settings. Table 11. Accuracy, precision, recall, and f-score-Dataset 2 (position in the frequency domain for k = 2).

Discussion
This study aimed to determine the severity levels of strokes, evaluate the functions of the affected hands in post-stroke patients, and ultimately automate post-stroke assessment. To analyze a stroke dataset, advanced sensors are required in order to capture stroke movements. One of the strengths of this study was that it deployed an integration of two different sensor technologies-wearable sensor-based and camera-based-that collected the data using the same protocol and selected the same tasks. Both datasets focused on the upper limb in stroke patients and healthy participants while they performed daily living activities. We have proposed a method to estimate stroke survivors' severity levels-more specifically, to examine the functionality levels of affected hands in stroke patients-by deploying unsupervised learning for the first time to cluster the severity levels in stroke patients. Most studies utilize one to three clusterings; however, to make our results more powerful, 8 clustering methods were implemented for this research. An innovative approach, the PSA-MNMF clustering method, which combines all 8 clustering methods, was proposed. This method was compared with individual clusters as well as with other consensus clustering algorithms such as MCLA. The proposed consensus clustering method offered more robust and consistent results compared to individual clustering and enhanced the performance measurement outcomes compared to other methods. In addition, it must be noted that as per the literature, trunk displacement is one of the compensatory movements considerably presented by stroke patients' movements. With this knowledge from the literature, we proposed a novel labelling approach where trunk displacement was used to label the severity level in each patient. The frequency domains of position and acceleration and the merged datasets of position and acceleration in the frequency domain were used as part of a unique approach for both datasets in this study. All of the included clustering and consensus clustering models ran 100 times to reduce the variance. The exhaustive search method was applied to find the best combination of clusterings, and the best 10 results among all the possible combinations were selected and reported. The 2-clustering, as well as the 3-clustering, were reported for both datasets. The clustering results using the trunk displacement method were compared with the FMA-

Discussion
This study aimed to determine the severity levels of strokes, evaluate the functions of the affected hands in post-stroke patients, and ultimately automate post-stroke assessment. To analyze a stroke dataset, advanced sensors are required in order to capture stroke movements. One of the strengths of this study was that it deployed an integration of two different sensor technologies-wearable sensor-based and camera-based-that collected the data using the same protocol and selected the same tasks. Both datasets focused on the upper limb in stroke patients and healthy participants while they performed daily living activities. We have proposed a method to estimate stroke survivors' severity levels-more specifically, to examine the functionality levels of affected hands in stroke patients-by deploying unsupervised learning for the first time to cluster the severity levels in stroke patients. Most studies utilize one to three clusterings; however, to make our results more powerful, 8 clustering methods were implemented for this research. An innovative approach, the PSA-MNMF clustering method, which combines all 8 clustering methods, was proposed. This method was compared with individual clusters as well as with other consensus clustering algorithms such as MCLA. The proposed consensus clustering method offered more robust and consistent results compared to individual clustering and enhanced the performance measurement outcomes compared to other methods. In addition, it must be noted that as per the literature, trunk displacement is one of the compensatory movements considerably presented by stroke patients' movements. With this knowledge from the literature, we proposed a novel labelling approach where trunk displacement was used to label the severity level in each patient. The frequency domains of position and acceleration and the merged datasets of position and acceleration in the frequency domain were used as part of a unique approach for both datasets in this study. All of the included clustering and consensus clustering models ran 100 times to reduce the variance. The exhaustive search method was applied to find the best combination of clusterings, and the best 10 results among all the possible combinations were selected and reported. The 2-clustering, as well as the 3-clustering, were reported for both datasets. The clustering results using the trunk displacement method were compared with the FMA-UE score that had been reported in the open-source datasets using the standard method, i.e., examination by clinical experts. Then, accuracy, precision, recall, and F-score were reported by comparing the ground truths derived from the FMA-UE score and the clustering labels. The results indicated that for all 12 models (for example, for dataset-1 with 2-clusters, the position in the frequency domain, acceleration in the frequency domain, and the merge data of position and acceleration in the frequency domain, and similarly for 3-clusters and for dataset-2) the PSA-MNMF algorithm presented the highest performance measurements in the form of higher accuracy, precision, recall, and F-score when compared to the individual clustering as well to as the MCLA solver. After the proposed PSA-MNMF method, the MCLA solver, which was the ensemble clustering of all the individuals, demonstrated the highest performance measurement compared to the individual clustering results. The results derived from dataset-1 (which used wearable sensors) showed higher performance when compared to the camera-based systems; this could have been because the camera-based dataset seemed to have more noise compared to the wearable sensor dataset and required more preprocessing. Additionally, the results of the combination of two features, i.e., of position in the frequency domain and acceleration in the frequency domain, did not show any significant changes. The consensus clustering results agreed with the hypothesis that using combinations of several clustering would enhance the final output results. The proposed PSA-MNMF consensus solver presented better results compared to other solvers such as MCLA. Additionally, using the trunk displacement feature as one of the important features to define each cluster showed a promising method. Furthermore, a notable aspect of this study was that it included multiple data collection methods, specifically one that used wearable sensors and one that used camera systems. This choice showcases the versatility and wide-ranging applicability of our proposed method, serves as a key strength, and contributes to the overall richness and depth of this study. In conclusion, from a clinical perspective, the primary contribution of this study is to highlight the importance of transitioning from traditional clinical assessments to employing artificial intelligence (AI)-sensor-based systems for diverse assessments, specifically for those assessments focusing on functional abilities. Furthermore, the study emphasizes the potential of such systems in ultimately evaluating the quality of life among post-stroke patients. This shift in assessment methodologies has the potential to enhance the accuracy, efficiency, and overall understanding of patients' functional capabilities and well-being, leading to improved care and better outcomes in post-stroke rehabilitation.

Conclusions and Future Directions
In this paper, motion capture data on stroke patients and healthy subjects were collected from two different clinical universities. The Xsens and Vicon camera datasets were selected for this study since both collected the motion of the upper limb using a shared protocol. However, the inclusion of more datasets with similar protocols and using the same technology while collecting data (such as wearable sensors) could enhance the automated assessment of strokes, as well as any healthcare or rehabilitation planning. Additionally, it must be noted that 4 similar tasks were selected from each motion capture method in this study. However, using the same tasks would enhance the clustering results. The experimental results indicated that consensus clustering enhances cluster output using the novel trunk-displacement labelling method proposed in this study. Future work could investigate limited sensors or markers and compare the results using all upper-limb sensors or markers. Additionally, semi-supervised learning can be examined to automate assessment. Developing an open dataset for stroke patients while they perform FMA-UE tasks could also improve the results when comparing semi-supervised learning with unsupervised learning clustering using FMA scores. For this study, whole actions in participants was considered for finding the mean acceleration; however, looking at smaller steps could be a great future goal for exploration. In addition, for large, collected datasets, gender-based and age-based analysis could be investigated.
Funding: This research received funding from Toronto Metropolitan University's Faculty of Engineering and Architectural Science.