Automatic Functional Shoulder Task Identification and Sub-Task Segmentation Using Wearable Inertial Measurement Units for Frozen Shoulder Assessment

Advanced sensor technologies have been applied to support frozen shoulder assessment. Sensor-based assessment tools provide objective, continuous and quantitative information for evaluation and diagnosis. However, the current tools for assessment of functional shoulder tasks mainly rely on manual operation. It may cause several technical issues to the reliability and usability of the assessment tool, including manual bias during the recording and additional efforts for data labeling. To tackle these issues, this pilot study aims to propose an automatic functional shoulder task identification and sub-task segmentation system using inertial measurement units to provide reliable shoulder task labeling and sub-task information for clinical professionals. The proposed method combines machine learning models and rule-based modification to identify shoulder tasks and segment sub-tasks accurately. A hierarchical design is applied to enhance the efficiency and performance of the proposed approach. Nine healthy subjects and nine frozen shoulder patients are invited to perform five common shoulder tasks in the lab-based and clinical environments, respectively. The experimental results show that the proposed method can achieve 87.11% F-score for shoulder task identification, and 83.23% F-score and 427 mean absolute time errors (milliseconds) for sub-task segmentation. The proposed approach demonstrates the feasibility of the proposed method to support reliable evaluation for clinical assessment.


Introduction
Frozen shoulder (FS) is a common joint condition that causes stiffness and pain among people aged from 40 to 65 years [1], especially in women [2]. The stiffness and pain of shoulder joints lead the limitation to the range of motion in all movement planes of the shoulder joints. FS has great impacts on the quality of daily life and activities of daily living (ADL) performance [2,3]. The common treatments in FS patients involving physical therapy and joint shoulder injection aim to relieve pain, improve joint mobility, and increase the independent ability. In order to support clinical decisions, there is a requirement of objective assessment for clinical evaluations and follow-up progresses [4].
Goniometry measurements [5] and questionnaires [6] are common evaluation tools for clinical FS assessment. However, these traditional assessment approaches have several challenges and limitations related to inter-rater reliability, respondent interpretation, and cultural diversity [7][8][9]. In recent years, inertial measurement units (IMUs) have been Sensors 2021, 21, 106 2 of 22 used to develop objective evaluation systems. Joint evaluation systems using IMUs have advantages in simplification of implementation, cost, and computation complexity. They have the potential to continuously and accurately measure dynamic and static range of motion of shoulder joints, including flexion, extension and rotation [10]. Previous studies have shown the reliability of measurement systems with inertial sensors for elbow and shoulder movement in laboratory environments [10][11][12][13].
For FS patients, wearable IMUs are also implemented to objectively measure functional abilities while the questionnaires can only provide subjective scores from the patients (e.g., shoulder pain and disability index [14] and simple shoulder score [15]). These works extracted movement features and parameters to evaluate the performance of functional shoulder tasks. However, the whole measurement still relies on manual operation. For example, researchers or clinical professionals have to manually label the starting and ending time of the shoulder tasks from the continuous signals. Then, they label the spotted shoulder task with the correct task information. These additional efforts may decrease the feasibility and usability of the IMU-based evaluation systems in the clinical setting.
To tackle the aforementioned challenges, this pilot study aims to propose an automatic functional shoulder task identification and sub-task segmentation system using wearable IMUs for FS assessment. We hypothesized that the proposed wearable-based systems would be reliable and feasible to automatically provide shoulder task information for clinical evaluation and assessment. Several typical pattern recognition and signal processing techniques (e.g., symmetry-weighted moving average, sliding window and principal component analysis), machine learning models (e.g., support vector machine, k-nearestneighbor and classification and regression tree), and rule-based modification are applied to the proposed system to accurately identify shoulder tasks and segment sub-tasks from continuous sensing signal. Moreover, a hierarchical approach is applied to enhance the reliability and efficiency of the proposed system. The novelty and contribution of this pilot study are listed as follows:

•
This work firstly proposes a functional shoulder task identification system for automatic shoulder task labeling while the traditional functional measurement in clinical setting still relies on manual operation.

•
The proposed approach can provide not only shoulder task information (e.g., cleaning head) but also sub-task information (e.g., lifting hands to head, washing head and putting heads down). Such sub-task information has the potential to support clinical professionals for further analysis and examination.

•
The feasibility and the effectiveness of the proposed shoulder identification and subtask segmentation is validated on nine FS patients and nine healthy subjects.

Related Works
In recent years, automatic movement identification and segmentation algorithms have been proposed to clinical evaluation and healthcare applications [16][17][18][19][20]. The main objective of identification and segmentation algorithms is to spot the starting and ending points of target activates precisely. For example, previous studies have developed diverse approaches to automatically and objectively obtain detailed lower limb and trunk movement information, such as sitting, standing, walking and turning [16]. Such reliable segmentation approaches can assist clinical professionals for various disease assessment, involving Parkinson's disease [17], fall prediction [18] and dementia [19]. Similar approaches are also applied to upper limb assessments in stroke patients. Biswas et al. [20] proposed segmentation algorithms using a single inertial sensor to gather three basic movements from the complicated forearm activities in healthy and stroke patients, involving extension, flexion and rotation. However, few studies focus on the development of automatic systems in FS patients [11]. Most evaluation tools for FS assessment still relied on manual operation [10,[21][22][23][24].

•
Input and pre-processing: In the beginning, accelerometers and gyroscopes are utilized to collect shoulder task sequences (Input). Then, the sensing sequences are preprocessed with the moving average technique to filter the noises. These pre-processed sequences are spilt into the training set and testing set for the training and testing stages, respectively. • Training for shoulder task identification: The feature extraction process with 12 feature types is firstly applied to the pre-processed sequences. Then, the principal component analysis is employed to reduce the size of the features and select the critical features for training machine learning models. Next, the machine learning model is trained with the selected features of the training set for shoulder task identification. Various machine learning techniques, including SVM, CART, and kNN, are investigated in this work. The parameter optimization for each technique is executed in this stage. • Training for sub-task segmentation: First, the sliding window technique divides the pre-processed sequences into segments. Then, the feature extraction and dimension reduction techniques are employed to obtain the critical features from the segments. Lastly, the machine learning model for ML-based sub-task segmentation is built with the critical features. During the training stage, several machine learning techniques (e.g., SVM, CART, and kNN) and their optimized parameters are also explored. • Testing for shoulder task identification: Initially, the selected features are extracted from the shoulder task sequence of the testing set. Then, these features are identified using the trained machine learning model to output the shoulder task information (output 1). • Testing for sub-task segmentation: After the testing stage of the shoulder task identification, the sliding window technique is firstly applied to the shoulder task sequence to gather a sequence of segments. Secondly, the feature extraction process is employed to the segments to obtain selected features. Thirdly, the process of ML-based subtask segmentation classifies these segments and the corresponding features using the trained machine learning models and outputs a sequence of the identified class labels. Fourthly, the rule-based modification is utilized to modify the output of the ML-based sub-task segmentation. Finally, the sub-task information generator generates a sequence of sub-task labels based on the classified and modified class labels and outputs it as the sub-task information (output 2). based sub-task segmentation classifies these segments and the corresponding features using the trained machine learning models and outputs a sequence of the identified class labels. Fourthly, the rule-based modification is utilized to modify the output of the ML-based sub-task segmentation. Finally, the sub-task information generator generates a sequence of sub-task labels based on the classified and modified class labels and outputs it as the sub-task information (output 2).

Figure 1.
The framework of the automatic shoulder task identification and sub-task segmentation.

Participants
Participants were outpatients at a rehabilitation department of Tri-service general hospital who were diagnosed with primary FS between June 2020 and September 2020. The patients were included if they have shoulder pain with a limited range of motion more than 3 months and age from 20 to 70 years old. Participants were diagnosed with primary FS according to standardized history, physical examination, and ultrasonographic evaluation by an experiment physiatrist. Patients were excluded if they had any of the following: full or massive thickness tear of the rotator cuff on ultrasonography or magnetic resonance imaging (MRI); secondary FS (secondary to other causes, including metabolic, rheumatic, or infectious arthritis; stroke; tumor; or fracture); and acute cervical radiculopathy.
The study was approved by the institutional review board (TSGHIRB No.: A202005024) at the university hospital, and all participants gave written informed consent. Our research procedure followed the Helsinki Declaration. All participants were assured that their participation was entirely voluntary and that they could withdraw at any time. Nine healthy adults (height: 170.6 ± 7.9 cm, weight: 75.1 ± 17.0 kg, age: 27.0 ± 5.0 years old) and nine FS patients (height: 164.3 ± 11.1 cm, weight: 66.3 ± 14.4 kg, age: 56.4 ± 9.9 years old) participated in the experiments.

Experimental Protocol and Data Collection
Two IMUs placed on the arm and wrist are employed to sense the upper limb movement, as shown in Figure 2. Similar sensor placements have been selected in previous works [20,21]. The sensors placed on the arm and wrist can catch information of upper limb movement while performing shoulder tasks. The used IMU (APDM Inc., Portland, OR, USA) involves a tri-axial accelerometer, tri-axial gyroscope, and tri-axial magnetometer. In this study, only the tri-axial accelerometer (range: ± 16 g; resolution: 14 bits) and tri-axial gyroscope (range: ± 2000°/s; resolution: 16 bits) work for the data. The data is collected with a sampling frequency of 128 Hz.

Participants
Participants were outpatients at a rehabilitation department of Tri-service general hospital who were diagnosed with primary FS between June 2020 and September 2020. The patients were included if they have shoulder pain with a limited range of motion more than 3 months and age from 20 to 70 years old. Participants were diagnosed with primary FS according to standardized history, physical examination, and ultrasonographic evaluation by an experiment physiatrist. Patients were excluded if they had any of the following: full or massive thickness tear of the rotator cuff on ultrasonography or magnetic resonance imaging (MRI); secondary FS (secondary to other causes, including metabolic, rheumatic, or infectious arthritis; stroke; tumor; or fracture); and acute cervical radiculopathy.
The study was approved by the institutional review board (TSGHIRB No.: A202005024) at the university hospital, and all participants gave written informed consent. Our research procedure followed the Helsinki Declaration. All participants were assured that their participation was entirely voluntary and that they could withdraw at any time. Nine healthy adults (height: 170.6 ± 7.9 cm, weight: 75.1 ± 17.0 kg, age: 27.0 ± 5.0 years old) and nine FS patients (height: 164.3 ± 11.1 cm, weight: 66.3 ± 14.4 kg, age: 56.4 ± 9.9 years old) participated in the experiments.

Experimental Protocol and Data Collection
Two IMUs placed on the arm and wrist are employed to sense the upper limb movement, as shown in Figure 2. Similar sensor placements have been selected in previous works [20,21]. The sensors placed on the arm and wrist can catch information of upper limb movement while performing shoulder tasks. The used IMU (APDM Inc., Portland, OR, USA) involves a tri-axial accelerometer, tri-axial gyroscope, and tri-axial magnetometer. In this study, only the tri-axial accelerometer (range: ±16 g; resolution: 14 bits) and tri-axial gyroscope (range: ±2000 • /s; resolution: 16 bits) work for the data. The data is collected with a sampling frequency of 128 Hz. The experiment is executed in the lab-based and clinical environments for healthy and FS subjects, respectively. Each subject is asked to perform five shoulder tasks once, including cleaning head, cleaning upper back and shoulder, cleaning lower back, placing an object on a high shelf, and putting/removing an object from the back pocket. These shoulder tasks have been widely adopted for shoulder function assessment and evaluation in previous works [21,22]. The performed shoulder tasks and the corresponding three sub-tasks are listed inTable 1. Each task consists of three sub-tasks. Totally, there are 90 shoulder task sequences (18 subjects × 5 shoulder tasks). The participants are free to execute tasks in their ways with basic manual instruction. The sub-tasks are performed continuously within the same shoulder task. Mean sub-task time performed by healthy and FS patients is listed in Table 2.   The experiment is executed in the lab-based and clinical environments for healthy and FS subjects, respectively. Each subject is asked to perform five shoulder tasks once, including cleaning head, cleaning upper back and shoulder, cleaning lower back, placing an object on a high shelf, and putting/removing an object from the back pocket. These shoulder tasks have been widely adopted for shoulder function assessment and evaluation in previous works [21,22]. The performed shoulder tasks and the corresponding three sub-tasks are listed in Table 1. Each task consists of three sub-tasks. Totally, there are 90 shoulder task sequences (18 subjects × 5 shoulder tasks). The participants are free to execute tasks in their ways with basic manual instruction. The sub-tasks are performed continuously within the same shoulder task. Mean sub-task time performed by healthy and FS patients is listed in Table 2.  The external camera synchronized with inertial sensors is applied to provide reference information for the ground truth labeling, including starting and ending points of shoulder tasks. During the experiment, the camera is put in front of the subjects. The frame per second of the camera is 30 Hz.

Data Pre-Processing
This study applies the symmetry-weighted moving average (SWMA) technique to the sensing signals to reduce the noise and artifacts for shoulder task identification and segmentation. This pre-processing technique has been applied to other applications while the sensors are placed on the upper limbs, including eating activity recognition and daily activity recognition [30,31]. SWMA technique determines different weights to sample points within the determined ranges. The data points closer to the central point are assigned with higher weights.
Suppose the sensing data of any shoulder task sequence is defined as S = {s i |i = 1, 2, . . . , n R }, where n R is the total number of the data samples from the sequence. The pre-processed sensing data point s t at time t with the determined range m is defined as follows: where Total δ is the sum of all determined weights, δ 0 is m+1 . The SWMA with m = 9 is applied to this study.

Feature Extraction
The main objective of the feature extraction process is to extract movement characteristics from the continuous sensing data for shoulder task identification. There are two feature categories that have been applied to catch motion features, such as statistical and kinematic features. The common statistical features involving mean, standard deviation (StD), variance (var), maximum (max), minimum (min), range, kurtosis, skewness, and correlation coefficient (CorrCoef) have been applied to the field of activity recognition applications [32]. These nine statistical features are applied to this work. Also, kinematic features have been applied to upper limb movement recognition systems in several clinical applications, such as stroke rehabilitation and assessment [33]. This study employs three general kinematic features, such as the number of velocity peaks (NVP), zero crossing (NZR), and mean crossing (NMR) for shoulder task identification.
Suppose a sequence of data from a sensor is defined as S = { s i |i = 1, 2, . . . , n R }, where n R is the total number of the data samples from the sequence. Any sample point s i includes data collected from a tri-axial sensor s i = r x i , r y i , r z i . Then, the feature extraction process is applied to the shoulder sequence. The utilized features are listed in Table 3. Table 3. A list of statistical and kinematic feature types from a single sensor.

No.
Description Number of mean crossing of r x i , r y i , r z i Note. r x i , r y i , r z i are the sample points of x-axis, y-axis and z-axis collected from a tri-axial sensor node.
In this work, the sensing data of the shoulder task sequence from two IMUs is defined as S seq = s i i = 1, 2, . . . , n seq , where n seq is the total number of S seq . Any sample point s i of S seq is defined as: The formation of the extracted features from S seq is show in Figure 3. There are two IMUs, four sensors (2 accelerometers + 2 gyroscopes), and a total of 144 features (4 sensor units × 36 features) are obtained.  Table 3. A list of statistical and kinematic feature types from a single sensor. In this work, the sensing data of the shoulder task sequence from two IMUs is defined as = ̃ = 1, 2, … , , where is the total number of . Any sample point ̃ of is defined as:

No. Description
The formation of the extracted features from is show in Figure 3. There are two IMUs, four sensors (2 accelerometers + 2 gyroscopes), and a total of 144 features (4 sensor units × 36 features) are obtained.

Feature Selection
During the training stage, the feature selection process is applied to all extracted features after the feature extraction. This is because the size of all features (144 features) is quite big for the systems.
Using a suitable feature selection technique can simplify the computing processes, which is beneficial for training and testing stages. This study utilizes principal component analysis (PCA) [34] to select critical features and reduce the number of features in dealing with multi-dimensional time sequence data. PCA aims to find a linear transformation matrix that transforms the raw feature vectors = , , … , to lower dimensional feature vectors = , , … , , where = 144 is the number of the raw feature vectors and is the number of the transformed feature vectors.
Firstly, the covariance matrix is calculated based on the variance maximization of the projected data. Then, the eigenvalues = ( , , … , ) and eigenvectors = ( , , … , ) can be determined based on . Note that the eigenvectors are the principal components, where that first eigenvector has the largest variance.

Feature Selection
During the training stage, the feature selection process is applied to all extracted features after the feature extraction. This is because the size of all features (144 features) is quite big for the systems.
Using a suitable feature selection technique can simplify the computing processes, which is beneficial for training and testing stages. This study utilizes principal component analysis (PCA) [34] to select critical features and reduce the number of features in dealing with multi-dimensional time sequence data. PCA aims to find a linear transformation matrix that transforms the raw feature vectors F = f 1 , f 2 , . . . , f k to lower dimensional feature vectorsF = f 1 ,f 2 , . . . ,f l , where k = 144 is the number of the raw feature vectors and l is the number of the transformed feature vectors.
Firstly, the covariance matrix C f is calculated based on the variance maximization of the projected data. Then, the eigenvalues λ = (λ 1 , λ 2 , . . . , λ k ) and eigenvectors ν = (ν 1 , ν 2 , . . . , ν k ) can be determined based on C f . Note that the eigenvectors ν are the principal components, where that first eigenvector has the largest variance.
In the dimension reduction process, the l eigenvectors with the most explained components are kept, where l ≤ k. A threshold thres = 0.99 is set to keep 99% variance information of the raw feature vectors. The minimum value of l is determined as Equation (3): For the shoulder task identification, the number of features is reduced from 144 to 35 after PCA and dimension reduction processes. Compared to the original raw feature vectors, the system using the transformed feature sets has the potential to reduce computational complexity for the classification of the shoulder task.

Shoulder Task Identification Using Machine Learning
Suppose there is a set of class labels C = c 1 , c 2 , . . . , c n c , where n C is the number of the class labels. The training set Γ train = F train i , c i i = 1, 2, . . . , n train has n train pairs of feature vectorsF train i and the corresponding label c i . In the training stage, the machine learning technique can optimize the parameters θ of a classification model by minimizing the classification loss on Γ train . For the shoulder task identification, n C = 5 is the number of the shoulder tasks.
In the testing stage, given that the testing set Γ test = F test i i = 1, 2, . . . , n test has n test feature vectors. EachF test i is mapped to a set of class labels C with the corresponding confidence score P i = p j i j = 1, 2, . . . , n C using the trained classification model H with the optimized parameters θ: where c ∈ C. Then, we select the class label with the maximum confidence score as the final classification output: There are various machine learning models have been applied to segment human movements and recognize activities in other clinical applications [16][17][18][19][20]. At this moment, several machine learning techniques requiring a lot of data volume for model training are not considered in this work, involving HMM, CNN, and RNN. Therefore, we focus on exploring the feasibility of the following machine learning models for shoulder task identification: • Support vector machine (SVM): The main objective of the SVM model is to find a hyperplane to separate two classes. It maximizes the margin between two classes to support distinct classification with more confidence. Since the number of the classes are more than two, we employ one-vs-all techniques to multi-class classification with a radial basis kernel function. • K-nearest-neighbors (kNN): kNN approach is also called as a lazy classifier as this approach actually does not require any training process. The main idea of this approach is to determine the class of the testing data based on the major voting of nearest k neighbors. The determination of the value k is application-dependent, which have critical influences on the performance of the classifier. In this work, a range of k from 1 to 9 is explored. The results show that k = 7 achieves the best detection performance.

•
Classification and regression tree (CART): The CART approach is a binary tree that can tackle classification and regression problems. The branch size and the process of the splitting is determined by measure of the Gini impurity. This approach has advantages in easy implementation and high processing speed.
The feasibility and reliability of the explored techniques have been validated in the field of activity recognition [29].

Sliding Window
There are several windowing approaches that have been proposed to divide the continuous data into chunks [35], involving sliding window, event-defined window and activity-defined window techniques. This work uses the sliding window to segment the data into small segments. This windowing approach is very popular in the field of activity recognition due to its simple realization and fast processing speed.
Suppose the pre-processed sensing data of the shoulder task sequence from two IMUs is defined as S seq = s i i = 1, 2, . . . , n seq , where n seq is the total number of S seq . The sliding window technique is applied to S seq with several parameters, including window size ws, the starting point of the segment sp, ending point of the segment ep, sliding samples ss. The pseudocode of the sliding window is described in Algorithm 1 and illustrated in Figure 4.

Sliding Window
There are several windowing approaches that have been proposed to divide the continuous data into chunks [35], involving sliding window, event-defined window and activity-defined window techniques. This work uses the sliding window to segment the data into small segments. This windowing approach is very popular in the field of activity recognition due to its simple realization and fast processing speed.
Suppose the pre-processed sensing data of the shoulder task sequence from two IMUs is defined as = ̃ = 1, 2, … , , where is the total number of . The sliding window technique is applied to with several parameters, including window size , the starting point of the segment , ending point of the segment , sliding samples . The pseudocode of the sliding window is described in Algorithm 1 and illustrated in Figure 4.

Input:
the pre-processed sensing data S seq = s i i = 1, 2, . . . , n seq , window size ws, the starting point of the segment sp, ending point of the segment ep, sliding samples ss Output: a set of segments W = w j j = 1, 2, . . . , n sl 1: Begin 2: initialize sp ← 1 , e p ← ws and j ← After the process of sliding window, a set of segments obtained from the shoulder task sequences S seq is defined as W = w j j = 1, 2, . . . , n sl , where n sl is the total number of segments obtained from S seq . Any segment is defined as w j = s where os is the overlapped samples.
The window size has great impact on the system performance while using the sliding window technique. A range of window sizes from 0.1 to 1.5 s with a fixed overlapping of 50% is tested to explore the reliability of the proposed automatic sub-task segmentation.

Training Stage for Sub-Task Segmentation
Given that there is a set of segments W TrSet = w train j j = 1, 2, . . . , n TrSet obtained from the pre-processed shoulder task sequences using sliding window, where w train j = s train j j = 1, 2, . . . , n ws contains n ws sample points. Any s train i containing the sensing data collected from the wrist and arm is defined as S train i = a wrist xj , a wrist yj , a wrist zj , g wrist xj , g wrist yj , g wrist zj , a arm xj , a arm yj , a arm zj , g arm xj , g arm yj , g arm zj . The training process is as follows:  . In this work, there is a set of class labelś C = ć 1 ,ć 2 , . . . ,ć nć , where nĆ is 3, including sub-task A, B, and C. There are three main processes for sub-task segmentation: ML-based identification, rule-based modification and sub-task information generator. The first process is to employ ML approaches to segment and identify sub-tasks. Several typical machine learning approaches are tested, such as SVM, CART, and kNN. However, mis-segmentation and mis-identification is unavoidable during the process. Therefore, the second process is to correct the errors from the ML-based approach. The modification process modifies fragmentation errors as the identified results are irrational to the context. For example, a continuous data stream identified as sub-task B "washing head" should not involve other sub-tasks (e.g., lifting hands or putting hands down). Finally, the generator generates the sub-task information based on the outputs of the rule-based modification. are obtained from a pre-processed shoulder task sequence of the testing set S TeSeq = s 1 , s 2 , . . . , s n TeSeq by using the sliding window technique and feature extraction with the selected features, where n TeSeq and n S are the total number of S TeSeq and W TeSet , respectively. The detailed ML-based sub-task segmentation and rule-based modification processes in the testing stage is described as follows: • Firstly, the mapped confidence scoreṔ i = ṕ 1 i ,ṕ 2 i , . . . ,ṕ nć i of a set of class labelś C = ć 1 ,ć 2 , . . . ,ć nć from eachF test i is calculated, where nć is the total number ofĆ. • Secondly, eachF test i maps to a class labelć ML with the maximum confidence score by using the trained machine learning modelH and the optimized parametersθ. A sequence of classified class labels D ML = ć ML 1 ,ć ML 2 , . . . ,ć ML n S is generated fromÛ test usingH andθ. • Thirdly, the rule-based modification is applied to D ML to obtain a sequence of modified class labels D r = ć r 1 ,ć r 2 , . . . ,ć r n S . Ifć ML t is different fromć ML t−1 andć ML t+1 , andć ML t−1 is equal toć ML t+1 thenć ML t would be modified as the sub-task ofć ML t−1 andć ML t+1 , wheré c ML t ∈ D ML and 2 ≤ t ≤ n S − 1. An example to illustrate the modification process is shown in Figure 5. • Finally, a generator generates a sequence of sub-task labels D g = ć g 1 ,ć g 2 , . . . ,ć g n g based on D r , where n g is the total number of D g and determined as: where ws and ss are window size and sliding samples, respectively. The processes of the sub-task information generator are illustrated in Figure 6 and the corresponding pseudocode is shown in Algorithm 2.
Sensors 2021, 21, x FOR PEER REVIEW 11 of 23 tasks (e.g., lifting hands or putting hands down). Finally, the generator generates the subtask information based on the outputs of the rule-based modification.
Given that a set of segments = , , … , and the corresponding feature vectors = , , … , are obtained from a pre-processed shoulder task sequence of the testing set = ̃ , ̃ , … , ̃ by using the sliding window technique and feature extraction with the selected features, where and are the total number of and , respectively. The detailed ML-based sub-task segmentation and rule-based modification processes in the testing stage is described as follows: • Firstly, the mapped confidence score = ́ , ́ , … , ́ ́ of a set of class labels = ( ́ , ́ , … , ́ ́) from each is calculated, where ́ is the total number of . • Secondly, each maps to a class label ́ with the maximum confidence score by using the trained machine learning model and the optimized parameters . A sequence of classified class labels = ́ , ́ , … , ́ is generated from using and .

•
Thirdly, the rule-based modification is applied to to obtain a sequence of modified class labels = ́ , ́ , … , ́ . If ́ is different from ́ and ́ , and ́ is equal to ́ then ́ would be modified as the sub-task of ́ and ́ , where ́ ∈ and 2 ≤ ≤ − 1. An example to illustrate the modification process is shown in Figure 5.
• Finally, a generator generates a sequence of sub-task labels = ́ , ́ , … , ́ based on , where is the total number of and determined as: where and are window size and sliding samples, respectively. The processes of the sub-task information generator are illustrated in Figure 6 and the corresponding pseudocode is shown in Algorithm 2. Figure 5. An illustration of the modification process to the fragmentation errors. The "sub-task A" of ́ is the misidentified result that is modified as "sub-task B" according to the proposed rulebased modification.  Input: a sequence of modified class labels D r = ć r j j = 1, 2, . . . , n s , window size ws, sliding samples ss Output: a sequence of sub-task labels D g = ć g i i = 1, 2, . . . , n g 1: Begin 2: initializei ← 1 3: for j = 1 to n s − 1 do //for the first n s − 1 modified class labels 4: while i ≤ (j × ss) 5:ć This study utilizes a leave-one-subject-out cross-validation approach [32] to validate the system performance of the proposed shoulder task identification and sub-task segmentation. This validation approach divides the dataset into k folds based on the subjects, where k is the number of subjects; one fold is kept as the testing set and the remaining k-1 folds are utilized for the training. The whole process repeats k times until each fold is used as the testing set. Finally, the system outputs the average results of k folds.
In order to evaluate the reliability of the shoulder task identification, several typical metrics are utilized for performance evaluation, including sensitivity, precision and F-score [36] as shown in Equations (9)-(11): F − score = 2 × sensitivity × precision sensitivity + precision (11) where TP, FP, TN, and FN are true positive, false positive, true negative, and false negative. F-Score is the harmonic mean of precision and recall, which is a common approach to evaluate the reliability and performance of classification systems. There are two evaluation and analysis approaches applied for the evaluation of sub-task segmentation: the sample-based approach [36] and mean absolute time errors (MATE) [37][38][39]. An illustration of the evaluation approaches for sub-task segmentation is shown in Figure 7. The first one is to calculate the number of TP, FP, TN, and FN based on the sample-by-sample mapping between the ground truth and system outputs. Then, the sensitivity, precision and F-score are applied to assess the system reliability based on the mapping results. The second approach is to calculate the average of the absolute time errors between the reference and identified boundaries, where the boundary is the edge between two sub-tasks. There are two MATE values calculated for the proposed sub-task segmentation: • MATE A,B : MATE of the boundaries between sub-task A and sub-task B. • MATE B,C : MATE of the boundaries between sub-task B and sub-task C.
• MATE overall : MATE of all boundaries between sub-task A and sub-task B and between sub-task B sub-task C. • MATE : MATE of all boundaries between sub-task A and sub-task B and between sub-task B sub-task C.

Results
The experimental results of the shoulder task identification are shown in Table 4. The results show that the shoulder task identification using SVM model can achieve 87.06% sensitivity, 88.43% precision and 87.11% F-score, and outperform that using other ML models. However, the proposed approach using SVM model is still weak to tackle several shoulder tasks such as T3 (cleaning lower back) and T5 (putting/removing an object in/form the back pocket) while the F-score of other shoulder tasks can achieve over 90%. The sensitivity, precision, and F-score of the sub-task segmentation using different ML approaches and window sizes are presented in Tables 5-7, respectively. Generally, the sub-task segmentation using SVM and kNN models have the similar performance in sensitivity, precision, and F-score, which outperforms that using kNN model. The experimental results show that the proposed segmentation approach with SVM model can achieve the best overall performance in sensitivity (82.27%), precision (85.07%) and Fscore (83.23%) while the worst performance is with CART model. Furthermore, using

Results
The experimental results of the shoulder task identification are shown in Table 4. The results show that the shoulder task identification using SVM model can achieve 87.06% sensitivity, 88.43% precision and 87.11% F-score, and outperform that using other ML models. However, the proposed approach using SVM model is still weak to tackle several shoulder tasks such as T3 (cleaning lower back) and T5 (putting/removing an object in/form the back pocket) while the F-score of other shoulder tasks can achieve over 90%. Note. SVM: support vector machine; kNN: k-nearest-neighbors; CART: classification and regression tree.
The sensitivity, precision, and F-score of the sub-task segmentation using different ML approaches and window sizes are presented in Tables 5-7, respectively. Generally, the sub-task segmentation using SVM and kNN models have the similar performance in sensitivity, precision, and F-score, which outperforms that using kNN model. The experimental results show that the proposed segmentation approach with SVM model can achieve the best overall performance in sensitivity (82.27%), precision (85.07%) and F-score (83.23%) while the worst performance is with CART model. Furthermore, using SVM model has the best F-score of 86.53%, 82.75%, and 82.42% in the sub-task A, sub-task B, and sub-task C, respectively.  Table 6. The precision of the sub-task segmentation using machine learning approaches (%) vs. different window sizes (s). SVM  kNN  CART SVM  kNN  CART SVM  kNN  CART  SVM  kNN  CART  Table 7. The F-score of the sub-task segmentation using machine learning approaches (%) vs. different window sizes (s).

SVM kNN CART SVM kNN CART SVM kNN CART SVM kNN CART
The results also reveal that the F-score of the sub-task segmentation model using SVM and kNN models significantly decreases when the window is larger than 1.0 s. Most of them achieve the best performance as window sizes are 0.2 and 0.3 s. However, the performance using CART model achieves the best F-score with the window size of 1.5 s.
Tables 8-10 presents the sub-task segmentation performance of MATE A,B , MATE B,C and MATE overall using different machine learning models and window sizes for all subjects, healthy subject and FS patients, respectively. Overall, the proposed segmentation using kNN achieves the lowest MATE A,B , MATE B,C and MATE overall in most subject groups. However, the best machine learning models for MATE overall and MATE B,C of FS patients are SVM and CART, respectively. The lowest MATE overall of all subjects, healthy subjects and FS patients are 427, 273, and 517 ms, respectively. Also, the experimental results reveal that the MATE of healthy subjects is lower than that of the FS patients.   The impact of window sizes in the sub-task segmentation performance of MATE A,B , MATE B,C and MATE overall is similar to that of sensitivity, precision and F-score. The proposed segmentation approach with different machine learning models have the lowest MATE values when the window size is smaller or equal to 1.0 s. Particularly, the results show that the proposed segmentation system using window sizes of 0.1 and 1.0 s can achieve the lowest MATE A,B , MATE B,C and MATE overall .
An example to demonstrate the processes of ML-based identification and rule-based modification for sub-task segmentation on the healthy subject is shown in Figure 8. It presents that a complete segment is often divided into fragments when the system used ML-based segmentation only, as shown in Figure 8c. For example, a segment of sub-task B is divided into 4 fragments. The proposed rule-based modification can correct the segmentation errors caused by ML-based sub-task segmentation, as presented in Figure 8d. After the processes of ML-based sub-task segmentation and rule-based modification, the segmentation errors of this work mainly occur in the boundaries between two sub-tasks, which decrease the performance of the proposed sub-task segmentation approach.

Discussion
Various sensor technologies have been applied to develop objective evaluation systems, including range of motion measurement and function evaluation. To tackle the issues in labeling errors and bias during the measurement, we propose an automatic functional shoulder task identification and sub-task segmentation system using wearable IMUs for FS assessment. The proposed approach can achieve 87.11% F-score for shoulder task identification, and 83.23% F-score, 387 MATE A,B and 403 MATE B,C for sub-task segmentation. The proposed system has the potential to support clinical professionals in automatic shoulder task labeling and sub-task information obtainment.
The results show that the proposed shoulder task identification has poor performance on T3 and T5 as the F-score on them are lower than 80%. This is because several FS patients are unable to move hands to the lower back but they can reach the back pocket while performing T3. The execution of T3 and T5 performed by the patients have very similar movement patterns. Such a situation confuses the models for identification of T3 and T5, even for SVM model.
Several machine learning models have been applied in this work, including SVM, CART and kNN. Previous works have shown the feasibility and the effectiveness of these models in movement identification and segmentation [16][17][18][19][20]. The proposed segmentation approach using SVM and kNN models can achieve the best performance in F-score and MATE, respectively. However, the differences between their segmentation performance are very close in the two evaluation performance approaches. Considering that the kNN model has the advantages of less computation complexity and simple implementation, the kNN model is more suitable for the proposed system.
Previous studies have shown that the sliding window approach is sensitive to the window sizes [35]. The proposed sub-task segmentation approach has similar experimental results as the segmentation performance with different window sizes ranges over 10%. This is because the larger sizes of the window may smooth the movement characteristics that confuse the identification models and lead to misidentification. Also, using too larger window sizes may lead to early or late segmentation of the sub-tasks, which increases the segmentation errors of the proposed system. An illustration of the segmentation performance using smaller and larger window sizes is shown in Figure 9.

Discussion
Various sensor technologies have been applied to develop objective evaluation systems, including range of motion measurement and function evaluation. To tackle the issues in labeling errors and bias during the measurement, we propose an automatic functional shoulder task identification and sub-task segmentation system using wearable IMUs for FS assessment. The proposed approach can achieve 87.11% F-score for shoulder task identification, and 83.23% F-score, 387 , and 403 , for sub-task segmentation. The proposed system has the potential to support clinical professionals in automatic shoulder task labeling and sub-task information obtainment.
The results show that the proposed shoulder task identification has poor performance on T3 and T5 as the F-score on them are lower than 80%. This is because several FS patients are unable to move hands to the lower back but they can reach the back pocket while performing T3. The execution of T3 and T5 performed by the patients have very similar movement patterns. Such a situation confuses the models for identification of T3 and T5, even for SVM model.
Several machine learning models have been applied in this work, including SVM, CART and kNN. Previous works have shown the feasibility and the effectiveness of these models in movement identification and segmentation [16][17][18][19][20]. The proposed segmentation approach using SVM and kNN models can achieve the best performance in F-score and MATE, respectively. However, the differences between their segmentation performance are very close in the two evaluation performance approaches. Considering that the kNN model has the advantages of less computation complexity and simple implementation, the kNN model is more suitable for the proposed system.
Previous studies have shown that the sliding window approach is sensitive to the window sizes [35]. The proposed sub-task segmentation approach has similar experimental results as the segmentation performance with different window sizes ranges over 10%. This is because the larger sizes of the window may smooth the movement characteristics that confuse the identification models and lead to misidentification. Also, using too larger window sizes may lead to early or late segmentation of the sub-tasks, which increases the segmentation errors of the proposed system. An illustration of the segmentation performance using smaller and larger window sizes is shown in Figure 9.   Figure 10 shows the signal of T2 "clean upper back and shoulder task" collected from the FS patient and healthy subject using a wrist-worn sensor. Due to stiffness and pain of the shoulder, the FS patients perform the shoulder task slowly and carefully with a limited range of motion. Obviously, the movement patterns of the three sub-tasks performed by the FS patient are significantly different from those performed by the healthy subject. It means the shoulder task can be performed in diverse ways according to the health status and the function of the shoulder, which leads to identification challenges of variability and similarity to the shoulder task identification and sub-task segmentation [32].  Figure 10 shows the signal of T2 "clean upper back and shoulder task" collected from the FS patient and healthy subject using a wrist-worn sensor. Due to stiffness and pain of the shoulder, the FS patients perform the shoulder task slowly and carefully with a limited range of motion. Obviously, the movement patterns of the three sub-tasks performed by the FS patient are significantly different from those performed by the healthy subject. It means the shoulder task can be performed in diverse ways according to the health status and the function of the shoulder, which leads to identification challenges of variability and similarity to the shoulder task identification and sub-task segmentation [32]. To our best knowledge, this is the first study aiming to identify and segment upper limb movements of shoulder tasks using machine learning approaches in FS patients, especially for FS assessment. Machine learning models have been successfully applied to automatic movement identification and recognition models to analyze lower limb movements in other clinical applications [16][17][18][19][20]. However, most IMU-based shoulder function assessment systems still rely on manual operation [10,[21][22][23][24]. Our results demonstrate the feasibility and effectiveness of the ML-based functional shoulder task identification for supporting clinical assessment and proof of concept. Moreover, the proposed system can obtain sub-task information from continuous signals, which has the potential for further analysis and investigation of functional performance.
Some technical challenges still limit the performance of the proposed system to shoulder task identification and sub-task segmentation, including gesture time, variability, similarity, and boundary decision. We plan to test other powerful machine learning models to improve identification and segmentation performance, such as CNN, LSTM, longest common subsequence (LCSS) dynamic time warping (DTW), hidden Markov model (HMM) and conditional random field (CRF). Another limitation is that the proposed automatic system is validated on five shoulder tasks only. More shoulder tasks from other clinical tests and questionnaires are going to be explored for validation of the proposed system, e.g., simple and shoulder score [14], American Shoulder and Elbow Surgeons score [40], and so on. Furthermore, there are only nine FS patients and nine healthy subjects participating in this work. More FS patients with different functional disabilities, the different ages of healthy subjects and different disease groups will be recruited for validation and investigation. To our best knowledge, this is the first study aiming to identify and segment upper limb movements of shoulder tasks using machine learning approaches in FS patients, especially for FS assessment. Machine learning models have been successfully applied to automatic movement identification and recognition models to analyze lower limb movements in other clinical applications [16][17][18][19][20]. However, most IMU-based shoulder function assessment systems still rely on manual operation [10,[21][22][23][24]. Our results demonstrate the feasibility and effectiveness of the ML-based functional shoulder task identification for supporting clinical assessment and proof of concept. Moreover, the proposed system can obtain sub-task information from continuous signals, which has the potential for further analysis and investigation of functional performance.
Some technical challenges still limit the performance of the proposed system to shoulder task identification and sub-task segmentation, including gesture time, variability, similarity, and boundary decision. We plan to test other powerful machine learning models to improve identification and segmentation performance, such as CNN, LSTM, longest common subsequence (LCSS) dynamic time warping (DTW), hidden Markov model (HMM) and conditional random field (CRF). Another limitation is that the proposed automatic system is validated on five shoulder tasks only. More shoulder tasks from other clinical tests and questionnaires are going to be explored for validation of the proposed system, e.g., simple and shoulder score [14], American Shoulder and Elbow Surgeons score [40], and so on. Furthermore, there are only nine FS patients and nine healthy subjects participating in this work. More FS patients with different functional disabilities, the different ages of healthy subjects and different disease groups will be recruited for validation and investigation.

Conclusions
In order to support FS assessment in the clinical setting, we propose a functional shoulder task identification system using IMUs for shoulder task identification and subtask segmentation. We use several typical pattern recognition techniques, machine learning models and rule-based modification to automatically identify five shoulder tasks and segment three sub-tasks. The feasibility and reliability of this study are validated with healthy and FS subjects. The experimental results show that the proposed system has the potential to provide automatic labeling of the shoulder task and sub-task information for clinical professionals.