Predicting Axial Impairment in Parkinson’s Disease through a Single Inertial Sensor

Background: Current telemedicine approaches lack standardised procedures for the remote assessment of axial impairment in Parkinson’s disease (PD). Unobtrusive wearable sensors may be a feasible tool to provide clinicians with practical medical indices reflecting axial dysfunction in PD. This study aims to predict the postural instability/gait difficulty (PIGD) score in PD patients by monitoring gait through a single inertial measurement unit (IMU) and machine-learning algorithms. Methods: Thirty-one PD patients underwent a 7-m timed-up-and-go test while monitored through an IMU placed on the thigh, both under (ON) and not under (OFF) dopaminergic therapy. After pre-processing procedures and feature selection, a support vector regression model was implemented to predict PIGD scores and to investigate the impact of L-Dopa and freezing of gait (FOG) on regression models. Results: Specific time- and frequency-domain features correlated with PIGD scores. After optimizing the dimensionality reduction methods and the model parameters, regression algorithms demonstrated different performance in the PIGD prediction in patients OFF and ON therapy (r = 0.79 and 0.75 and RMSE = 0.19 and 0.20, respectively). Similarly, regression models showed different performances in the PIGD prediction, in patients with FOG, ON and OFF therapy (r = 0.71 and RMSE = 0.27; r = 0.83 and RMSE = 0.22, respectively) and in those without FOG, ON and OFF therapy (r = 0.85 and RMSE = 0.19; r = 0.79 and RMSE = 0.21, respectively). Conclusions: Optimized support vector regression models have high feasibility in predicting PIGD scores in PD. L-Dopa and FOG affect regression model performances. Overall, a single inertial sensor may help to remotely assess axial motor impairment in PD patients.


Introduction
Parkinson's disease (PD) is a neurodegenerative disorder clinically characterized by bradykinesia, tremor, and rigidity [1]. Besides these cardinal signs, axial impairment, including gait and postural disorders, is among the most disabling symptoms responsible for progressive motor impairment and frequent falls in PD [2,3]. According to the Hoehn and Yahr scale, the staging of PD is based on the severity of axial signs, including balance impairment and the ability to walk independently [4]. Currently, the clinical assessment of axial impairment in PD implies the measurement of the postural stability/gait difficulty (PIGD) score which represents an accurate indicator of the disease severity and prognosis [5]. To Table 1. Demographic and clinical features of patients enrolled in the present study (mean ± standard deviation). H&Y: Hoehn and Yahr.

Experimental Protocol and Data Acquisition
Patients were asked to perform a 7-m timed-up-and-go (TUG) test consisting of the following procedures: (1) getting up from a chair; (2) walking in a straight line for 7 m; (3) turning; (4) walking back; (5) sitting down. To maximize the ecological value of our recordings and trigger the possible occurrence of FOG, the 7-m TUG test was performed in a free living-like environment with a number of factors simulating a domestic setting (e.g., passage from a spacious room to a narrow and furnished corridor with the interposition of an open door) [8]. Patients' gait was video-recorded through a camera and monitored by a single IMU placed and fixed on the thigh through an elastic band ( Figure 1). The IMU positioning on the patient's thigh was implemented so that when the patient was standing, the y-axis represented the inverse gravity vector and x-axis lies in the frontal plane. Hence, the angular velocity around the x-axis allowed a good representation of the thigh motion during linear gait. The STMicroelectronics system-on-board prototype neMEMSi [27] was equipped with the following components: a 9-axis IMU (LSM9DS0), integrating a 3-axis accelerometer and a 3-axis gyroscope; a Bluetooth V3.0 module (BT33); a lithium-ion battery; an ultralow-power 32-bit microcontroller (STM32L1). Sensors range was settable up to ±16 g and ±2000 dps for accelerometer and gyroscope, respectively. A sample frequency up to 200 Hz can be used. The device size (including battery) is 25 mm × 30 mm × 4 mm ( Figure 1). Moreover, neMEMSi includes a temperature sensor, a hygrometer sensor and a pressure sensor that were not used for this study. Table 3 reports technical characteristics of the inertial sensors embedded in the IMU (specifications refer to those set in this study). Before placement, a preliminary conventional calibration of the inertial sensors was performed, including software correction of the displacement of the IMU framework with respect to the earth framework. Specifically, static acquisitions of both accelerometer and gyroscope data were carried out as indicated in [28,29]. The IMU was systematically arranged in specific positions on a table. The operations to correct or align the sensor with the reference framework were performed in real-time, with NeMEMSi transmitting data via Bluetooth to the PC. Orientation was derived from measurements and compared to earth observation framework. Rotation between sensor and earth quaternions was calculated at each IMU tested position and used for orientation correction. Once the calibration procedure was finished, the IMU was positioned on the patient. The resulting data were sent in real-time to a personal computer through the neMEMSi Bluetooth module and progressively saved in CSV format. Each CSV file was related to a single test. Data in CSV files were processed offline as described in the next section.

Preprocessing
In this section, the signal processing steps performed prior to the statistical analysis and the regression task are described. First, a sensor fusion process was performed to compute the orientation signal from the raw accelerometer and raw gyroscope readings (Section 2.3.1). Then, the orientation signal was used to detect walking bouts from the entire TUG recording (Section 2.3.2). Finally, inertial data were segmented and temporal and spectral features were extracted from each stride (Section 2.3.3).

Orientation Estimation
A Kalman filter [30] was used to estimate the sensor orientation from the fusion of raw acceleration and angular velocity recordings. The sensor fusion algorithm alternates iteratively two processes, including a prediction step and a correction step. The former consists in an approximation of the orientation estimate, performed through an integration of the gyroscope readings; the latter exploits accelerometer readings to correct the drift due to the integration of the slow-varying bias affecting the gyroscope measurements [31]. Figure 2 shows the raw gyroscope (a) and raw accelerometer (b) readings, and the orientation estimate (d) obtained using the Kalman filter (c). After orientation estimation, acceleration, angular velocity, and orientation signals were filtered, in order to keep only the frequency components of interest while removing mean values, low-frequency trends, and high frequency noise. A second-order zero-lag band-pass Butterworth filter was used to keep only components in the 0.5-20 Hz band, while avoiding phase distortion.

Walking Bouts Detection
In order to select only the walking segments of data, a continuous wavelet transform (CWT)-based approach was implemented, which is often used for walking steps detection algorithms [32,33]. CWT uses inner products to measure the similarity between the signal x(t) and an analysing function, which is a wavelet ψ. Equation (1) reports the formula for CWT computation. First, the wavelet is shifted by b ∈ R values and stretched/compressed by a ∈ R + values, then the shifted and stretched/compressed versions of the wavelet ψ * ( t−b a ) is compared to the signal x(t) in order to compute their similarity. This procedure is performed using a mother wavelet ψ and all possible values of a and b.
In this study, a Morse mother wavelet was used, due to its similarity with the orientation signal pattern during walking. Moreover, the scale parameter (a) was set so that the frequency analysis was performed in the range 0.5-2 Hz. This is done considering that stride time is rather heterogeneous in PDPs, due to the variability of motor features among patients [18], the pharmacological condition [8], and the gait velocity [34]. In [35], stride time in PD was found to be 1.13 ± 0.21 s, taking into consideration eleven studies on parkinsonian gait.
The scalogram obtained from the CWT is reported in Figure 3 (restricted in the frequency range 0-1 Hz), where the yellow zones correspond to the walking segments of the signal. In order to identify walking bouts, the intensity profile was computed for each value of the frequency scale; then, the obtained profiles were averaged, and finally, the regions in which the average intensity profile exceeded the standard deviation value were selected. The result of this procedure is reported in Figure 4, where the walking bouts are identified in the orientation signal.

Segmentation and Feature Extraction
In each walking segment of the orientation signal, initial contacts were identified (ICs) as the positive orientation signal peaks [36]. Aiming to avoid possible double-peak detection, the orientation signal was low-pass filtered using a second-order zero-lag Butterworth filter, with a cut-off frequency of 2 Hz. In addition, only peaks higher than the signal standard deviation and at least 0.5 s apart were selected. As suggested in [36], final contacts (FCs) correspond to the negative peaks following the ICs. The acceleration, angular velocity, and orientation recordings were segmented into windows corresponding to strides (i.e., from an IC to the subsequent IC), in order to prepare the data for the subsequent feature extraction step.
From each stride, a total number of 102 features were extracted from the acceleration, angular velocity, and orientation signal. Features include spatiotemporal gait parameters, and both time-and frequency-domain features. For each stride i, stride time, stance time, and swing time were computed as follows: Tables 4 and 5 report the list of features extracted from the time and frequency domain, respectively. The listed features describe different aspects of the gait movement. For instance, Range, Std, and RMS are related to the movement amplitude and intensity; E tot and binEnergy measure the energy content of the signal; Entropy and sEntropy describe movement complexity; DHwidth and DHratio are related to the stride regularity. As far as the spectral features are concerned, they were computed from the Fast Fourier Transform (FFT) of the signal. In order to have homogeneous spectral representations of all strides from all patients, the number of points in which to represent the FFT was set to be n =T stride · F s , whereT stride is the average stride time found in PDPs [35] and F s is the sample frequency. For strides lasting more thanT stride , a small loss of spectral resolution occurs, while for strides lasting less thanT stride , some points are added to the FFT, obtained as linear interpolation of the actual data-points. In any case, a spectral resolution of at least 1 Hz is expected, which is adequate for the computation of features listed in Table 5.

ID Feature Component Number Equation Explanation
1 Min Corr correlation between axis pair Table 5. List of spectral-domain features extracted in the present study, together with equations and some brief explanations. α: acceleration; ω: angular velocity; θ: orientation.

Feature Component Number Equation Explanation
13 ratio between the energy of the principal harmonic and E tot 18 ratio between energy in specific frequency bands and E tot

PIGD Prediction
This section describes the statistical processing following the extraction of the entire feature set for each patient's stride, intended to investigate the clinical significance of the extracted features. First, correlation analysis was performed between engineered features extracted from walking bouts and the clinical scores; this was done computing the Pearson correlation coefficient and the corresponding p-value for each feature-clinical score pair. Then, a regression model was implemented to predict the PIGD score of PDPs (Section 2.4.1). The analysis was performed in patients both OFF and ON state of therapy, to evaluate the effect of the pharmacological treatment on the performance of the prediction model (Section 2.4.2). Finally, to also evaluate the effect of FOG on model performance, patients were divided based on the clinical presence of FOG (Section 2.4.3).

PIGD Score Regression
The Pearson correlation coefficient (r) between the extracted features and the PIGD score was computed in patients both OFF and ON therapy. In order to reduce the dimensionality of the entire feature set (i.e., 102 features), the least significant features (i.e., those with r < 0.4) were discarded. To further reduce the set dimensionality, the features were ranked according to their prediction capability. This was done exploiting two different approaches, and evaluating their effect on the final prediction capability.
The first approach consisted in sorting the features in descending order of r, then selecting the first N features. The second approach made use of principal component analysis (PCA) to reduce the dimensionality of the feature set, keeping only the first N principal components. The parameter N was tuned in the range of 5-25 estimating the effect of the different feature set dimensionality on the model performance. Figure 5 reports a schematic of the entire process. Feature scaling was applied to each feature using the z-score normalization, which consists in removing the mean value and dividing by the standard deviation. This was done to uniform the feature range, while reducing the effect of possible outliers. Then, range normalization was performed both on the feature set and on the target vector (i.e., PIGD score) to rescale data in the range [0, 1]. Concerning the regression model, a support vector regression (SVR) model [37,38] was implemented. In order to provide a robust performance evaluation, the model was tested employing the leave-one-subject-out (LOSO) cross-validation, which resembles the realistic working condition of the model. It consists in training the model with data from all patients except one, which is used as test. In order to optimize the model parameters, a LOSO-based training-validation procedure was carried out, selecting those parameters providing the best performance on the validation set. Kernel function, kernel scale, and misclassification cost (box-constraint) parameters were optimized for each SVR model, while the margin of tolerance (epsilon parameter) was set to the default value, corresponding to a tenth of the PIGD score standard deviation.
The entire process is described in Algorithm 1. The goodness-of-fit was assessed using the metrics reported in Equation (2).  The correlation coefficient (r) measures how well the model fits the dependent variable, i.e., how much variability in the dependent variable can be explained by the model; it ranges between 0 and 1, with larger values indicating better performance. Root mean square error (RMSE) and mean absolute error (MAE) are absolute measures of the goodness of fit, providing the entity of deviation from the target values. While MAE treats all errors the same, RMSE gives larger penalization to big prediction errors.

The Effect of L-DOPA
Inertial data from PDPs were divided based on the pharmacological condition. Two independent datasets were obtained from patients OFF and ON state of therapy. The motor condition of patients while OFF and ON was compared performing the Wilcoxon test on the MDS-UPDRS part III and on the PIGD score in the two pharmacological conditions. Then, the analysis reported in Figure 5 was performed, optimizing the model according to Algorithm 1. The performance obtained on patients OFF and ON was compared using different feature set sizes, different dimensionality reduction methods, and optimizing the regression model parameters. Finally, the performance of the model in patients OFF and ON therapy were compared.

The Effect of Freezing of Gait
The dataset was split according to clinical presence of FOG. Then, the Mann-Whitney U-test was used to compare both the clinical scores and the engineered features of FOG+ and FOG− patients. Then, the analysis reported in Figure 5 was performed, optimizing the model according to Algorithm 1. The performance obtained on patients with and without FOG was compared using different feature set sizes, different dimensionality reduction methods, and optimizing the regression model parameters. The entire procedure was carried out for each pharmacological condition. Finally, the the effect of FOG on the model performance was evaluated.
All the experiments were executed in Matlab R2020a, using a personal computer with Microsoft Windows 10, a 2.4 GHz Intel ® Core Processor i5-6200, 8 GB RAM and 4 GB GPU.

Clinical-Behavioural Correlations
Pearson correlation analysis showed that most time-and frequency-domain features significantly correlated with PIGD scores. In more detail, as axial motor control worsened, the minimum value of inertial signals increased, whereas maximum and root mean square values of inertial signals, average height of peaks in the time-domain and height of the dominant harmonic decreased. Table 6 summarizes the Pearson correlation coefficients and the respective p-values for different feature-PIGD pairs. Only the most informative features for either therapeutic conditions, i.e., those with a Pearson correlation coefficient with PIGD score larger than 0.5, were included in the table. Figure 6 reports the scatter plots for the average height of the dominant harmonic (mean DH height) versus PIGD score OFF and ON.

PIGD Score Regression
This section reports results from the optimized support vector regression models in LOSO validation. Specifically, Section 3.3 summarizes findings concerning the effect of L-Dopa by comparing regression models in patients OFF and ON therapy. The best model configuration was identified for each pharmacological condition and the performance of the regression models were compared. Section 3.4 reports findings concerning the effect of FOG occurrence, by comparing regression models in FOG+ and FOG−. The best model configuration was extracted for each subgroup of patients and the performance of the regression models were compared.

The Effect of L-DOPA
The Wilcoxon test demonstrated that both UPDRS-part III and the PIGD score were different in patients OFF and ON therapy (p < 0.001). Table 7 summarizes the performance of the regression model in terms of correlation coefficient, RMSE, and MAE, in PDPs OFF and ON state of therapy. Results are reported for different sizes of the feature set and different dimensionality reduction methods.
Based on the results from Table 7, visually reported in Figure 7, the following considerations were derived.
• Model: SVR with linear kernel is selected in 85% of cases; top performances were obtained with linear kernel and small values of box-constraint parameter (i.e., <0.009). • Number of features: increasing the feature set size did not ensure progressively better performances ( Figure 7). Best results were obtained with n = 15 features, both for patients OFF and ON therapy. • Dimensionality reduction: for larger feature set size (i.e., # features > 15), PCA-based dimensionality reduction always implied better results, compared to those attained with correlation-based feature selection (Figure 7). PCA-based dimensionality reduction method led to the best results both for patients OFF and ON therapy. • Performance: regression models provided better performances in patients OFF than those ON therapy.
Consequently, the best regression model parameters were identified for each pharmacological condition. Then, such models were trained on patients ON (OFF) therapy and tested on patients OFF (ON) therapy. This procedure resulted in r = 0.70 (0.67), RMSE = 0.57 (0.42), and MAE = 0.47 (0.15). When the model was tested using LOSO on all available data, regardless of the pharmacological condition, r = 0.64, RMSE = 0.22, and MAE = 0.17 were obtained from an SVR with linear kernel and box-constraint = 0.07. Figure 8 reports the true versus predicted score scatter plot, together with the the best fit line.     Table 9 reports the performance of the optimized regression models for FOG+ and FOG− patients ON state of therapy, visually reported in Figure 9.   Figure 9, it can be inferred that, when the size of the feature set increased, regression models provided comparable performance in FOG+ and FOG−. This was particularly evident for n = 25 features, for which r and RMSE were very similar in the two populations, regardless of the dimensionality reduction method. From the results above, the following considerations were derived. The performance gap between PDPs with and without FOG may be due to the different discrimination power of some features in the two populations. From Table 9, it turns out that, for each feature set size, the correlation between top-ranked features and PIGD score is larger in patients without FOG. Top-ranked features for those patients were found to be Min (r = 0.79, p = 0.001), vPeaks (r = −0.75, p = 0.004), RMS (r = −0.72, p = 0.006), hPeaks (r = −0.70, p = 0.008) from the x-axis orientation signal, and E tot (r = −0.79, p = 0.001) from the x-axis angular velocity signal. As far as concerns PDPs with FOG, top-ranked features included Min (r = 0.66, p = 0.004), DH height (r = −0.62, p = 0.008), RMS (r = −0.58, p = 0.015), hPeaks (r = −0.58, p = 0.016) from the y-axis acceleration signal, and DH height (r = −0.65, p = 0.005) from the x-axis orientation signal. Table 10 reports the performance of the optimized regression models for FOG+ and FOG− OFF state of therapy, visually reported in Figure 10.  Based on the results above, the following considerations were derived.

The Effect of Freezing of Gait
• Model: SVR with linear kernel is selected in 95% of cases; top performances were obtained with linear kernel both in patients with and without freezing of gait. • Number of features: increasing the feature set size did not ensure progressively better performances ( Figure 10). Best results were obtained with n = 25 (n = 15) features in patients with (without) freezing of gait. • Dimensionality reduction: PCA-based (correlation-based) dimensionality reduction was selected for patients with (without) FOG. • Performance: regression models provided slightly better performances in patients without FOG, in terms of RMSE (Figure 10), independently of the model configuration; performance in terms of r depends on the regression model parameters, with best results superior in patients with FOG (Table 10).
Based on the considerations above, the dimension of the feature set was set to 25 (15) and the dimensionality reduction method to the PCA-based (correlation-based) for FOG+ (FOG−). Then, the best regression model were trained on FOG+ (FOG−) and tested on FOG− (FOG+). This procedure resulted in r = 0.73 (0.69), RMSE = 0.36 (0.33), and MAE = 0.25 (0.25), respectively. Table 11 reports all the results obtained for each population under investigation and for each pharmacological condition; results were obtained using LOSO test. As evident from the table, model performance improved when considering separately patients in different pharmacological conditions. Concerning the effect of FOG, if the model is specifically trained on FOG+ and FOG− separately, the performance significantly improves in patients without FOG while ON state of therapy and in patients with FOG while OFF therapy. Table 12 reports the results obtained by training and testing the regression model on different populations (i.e., ON versus OFF therapy, FOG+ versus FOG−). Prediction errors provided by the global model (i.e., the regression model trained and validated on all subjects, independently of the pharmacological condition and of freezing of gait) were compared to those obtained using different models for each pharmacological condition separately. The Wilcoxon test proved that the difference in prediction errors was not statistically significant (p = 0.074); thus, a single model may be used to estimate the PIGD score. On the other hand, training the model on specific subgroups (e.g., patients with FOG, patients ON therapy) and testing on different subgroups led to a large performance impairment, as evident from RMSE values reported in Table 12. Summarizing these findings, it is possible to implement a very general algorithm, but attention should be paid to collect a very general dataset, including patients in different pharmacological conditions, as well as patients with and without FOG.
The large prediction errors observed when training and testing the model on different populations may be due to the different discrimination power of some features. As can be observed in Figure 11, the sensibility of some features to changes in PIGD score depends on the pharmacological condition. Some features were found to exhibit strong correlation with PIGD in patients ON therapy but not in patients OFF therapy, and vice versa. The same behaviour can be observed when training and testing the regression model on patients with and without FOG, while ON therapy. As previously reported when discussing Table 9, top-ranked features were different in FOG+ and FOG− patients, thus, the prediction model performance worsens when trained and tested on different populations.  Finally, from Table 12, it can be noticed that performance did not get significantly impaired when training the model with data from FOG+ (FOG−) patients and testing on data from FOG− (FOG+) patients, while OFF therapy. In this case, common top-ranked features included minimum value, root mean square value, and average height of peaks from the orientation signal; average value of the angular velocity signal; height of the dominant harmonic from the acceleration signal along the y-axis.

Discussion
Machine-learning algorithms can reliably predict PIGD scores in PDPs during gait through sensor-based recordings. In this study, a homogeneous cohort of PDPs was recruited and the PIGD scores were calculated in patients OFF and ON therapy, according to standardized clinical procedures. To further control for clinical biases, patients were allocated to the FOG or non-FOG group according to the direct observation of FOG episodes rather than only considering patients' records. Furthermore, to maximize the prediction performance, many time-and frequency-domain features were computed in addition to classical spatio-temporal parameters routinely used in gait analysis studies. Lastly, a comprehensive statistical analysis was provided on both clinical scores and engineered features, to provide deeper insights into the capability of features to measure axial motor impairment in PDPs.
Significant correlations were found between specific sensor-based variables in the time-as well as frequency-domain and PIGD scores, suggesting that higher PIGD scores are associated with greater kinematic abnormalities during gait in PDPs. In more detail, higher axial motor impairment measured with PIGD was associated with greater abnormalities in movement amplitude, intensity, and regularity in PDPs. In line with these findings, the authors of [23] found significant associations of PIGD scores with sensor-based measures, including the number of walking bouts, gait speed and sway area. These findings also agree with previous studies showing higher impairment of spatio-temporal gait parameters in PDPs presenting a PIGD phenotype with more severe axial dysfunction than those with a tremor-dominant phenotype [39,40]. Accordingly, our clinical-behavioural correlations lay the foundations for elaborating PIGD prediction models based on the considered timeand frequency-domain features.
When considering PIGD prediction with respect to L-Dopa intake, regression models had better performances in PDPs OFF than those ON state of therapy (p = 0.002). The finding of better performance in PDPs OFF with respect to those ON therapy is in line with previous results, reporting variable accuracy of ML algorithms in the sensor-based assessment of motor disorders in PDPs in different pharmacological conditions [41]. Indeed, L-Dopa significantly changes spatiotemporal stride parameters and, accordingly, acts on the ML performance when measuring gait in PD [41,42]. Since the PIGD score consists of several items reflecting both postural and walking abilities, a possible explanation for different ML performances in patients OFF and ON therapy relies on the heterogeneous L-Dopa sensitivity of balance and gait in PD. Indeed, unlike gait, L-Dopa usually does not substantially impact balance in PD [7,43,44]. Therefore, we hypothesize that PIGD scores are more accurately predicted in patients OFF than those ON state of therapy owing to a more similar trend of postural and walking abilities in patients not under dopaminergic therapy. The performance of our regression models in PDPs both OFF and ON therapy were higher than those reported in a previous study testing PDPs only ON therapy [23]. Moreover, while in [23] different activities (i.e., gait, turn, stance) were analysed to provide the final output, in the present study only features extracted from walking bouts were used to predict the PIDG score. Our results suggested that it is possible to implement a single regression model capable of predicting PIGD in PDPs, regardless of the therapeutic condition and the presence of freezing of gait. However, data should be collected from a heterogeneous cohort of PDPs, under different pharmacological conditions. When models were trained on a subgroup of patients (i.e., patients with FOG, patients ON therapy), impaired performance was observed when testing on a different subgroup.
Several time-and frequency-domain features, such as Min, RMS, Mean and DHheight, were found to have different sensibility in patients with and without FOG, a finding fully in agreement with previous studies demonstrating worse continuous gait abnormalities in PD patients with FOG than those without FOG, also outside FOG episodes [45][46][47]. Moreover, different time-and frequency-domain features also explain another relevant finding of this study, that is, models trained in patients OFF therapy do not perform well in patients ON therapy and vice versa. It is likely that, in addition to changes of continuous gait parameters and pharmacological condition, the unpredictable and sudden appearance of FOG affects the walking pattern in PDPs, worsening ML performances. In addition, despite the direct impact on gait, FOG is not included in the calculation of the PIGD score and, accordingly, is not considered for the assessment of axial impairment when using this standardized clinical index.
Unlike the only previous study predicting PIGD scores in PD using three sensing devices [23], only a single, small, and lightweight wearable inertial sensor placed on the thigh was here used, providing a little demanding and unobtrusive solution for everyday application in free-living settings. However, future studies are necessary to clarify the technical feasibility of applying the proposed ML algorithms to data recorded through smartphones in non-supervised environments.
In this study, the experimental tools, including wearable sensors and ML algorithms, were already largely used and validated [48][49][50][51]. The novelty of our study primarily relies on the application of these tools to overcome the clinical need for quantitative measures reflecting axial impairment in PD. However, when considering the findings of this study, the lack of validation on an independent test set is a possible limitation to be accounted for. Indeed, data from thirty-one subjects were used for the analysis. Accordingly, additional studies are necessary to reproduce these findings in larger cohorts of patients.

Conclusions
A single inertial sensor placed on the thigh may be a feasible wearable solution for the remote assessment of axial impairment in PD by predicting the PIGD score through dedicated ML algorithms. When implementing prediction models of PIGD scores, patients' pharmacological conditions and FOG occurrence are significant clinical variables to be considered. The use of an unobtrusive sensing system composed of a single inertial sensor supports the future adoption of commonly available smartphones, embedding inertial systems, for the long-term motor assessment in PD. Accordingly, future studies will address the need for collecting additional sensor-based data in PDPs to further implement subject-dependent prediction models. In addition, data collection will be performed directly in unsupervised conditions to monitor free-living daily activities and get more ecological measures.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.
Acknowledgments: Authors wish to thank Luigi Della Torre from STMicroelectronics (Agrate Brianza, Italy) for providing neMEMSi devices.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: