A Multimodal Approach to the Quantification of Kinetic Tremor in Parkinson’s Disease

Parkinson’s disease results in motor impairment that deteriorates patients’ quality of life. One of the symptoms negatively interfering with daily activities is kinetic tremor which should be measured to monitor the outcome of therapy. A new instrumented method of quantification of the kinetic tremor is proposed, based on the analysis of circles drawn on a digitizing tablet by a patient. The aim of this approach is to obtain a tremor scoring equivalent to that performed by trained clinicians. Models are trained with the least absolute shrinkage and selection operator (LASSO) method to predict the tremor scores on the basis of the parameters computed from the patients’ drawings. Signal parametrization is derived from both expert knowledge and the response of an artificial neural network to the raw data, thus the approach was named multimodal. The fitted models are eventually combined into model ensembles that provide aggregated scores of the kinetic tremor captured in the drawings. The method was verified with a set of clinical data acquired from 64 Parkinson’s disease patients. Automated and objective quantification of the kinetic tremor with the presented approach yielded promising results, as the Pearson’s correlations between the visual ratings of tremor and the model predictions ranged from 0.839 to 0.890 in the best-performing models.


Introduction
One of the Parkinson's disease (PD) symptoms related to the motor function is kinetic tremor, i.e., tremor occurring during voluntary movement [1]. Even though this tremor is not considered crucial for diagnosis, which is based on the presence of other motor manifestations including bradykinesia, rigidity, and rest tremor [2,3], it significantly affects patients' quality of life [4]. The frequency of the kinetic tremor is reported to remain within 4-9 Hz, reaching higher than the frequency of the rest tremor (4 to 6 Hz) and lower than physiological tremor (8 to 12 Hz) [1,4,5]. Additionally, it is a highly intermittent symptom [5].
In a clinical setting, the condition of a patient suffering from PD is typically determined with clinical scales, mainly the Unified Parkinson's Disease Rating Scale (UPDRS) [6,7]. Such assessments performed by medical personnel are subject to human-dependent variability, which impose a noise-like component on the scores. Specifically, it reduces applicability of the ratings performed by humans in case of monitoring the disease progression in its early stages, when changes in motor function are less evident [7]. Development of the instrumented methods of motor function quantification follows the need for reduction of subjectivity and improvement of resolution in relation to the existing scales.
Moreover, reliable monitoring of the disease from its onset is necessary to provide feedback for an early neuroprotective therapy, if such is introduced.
In this study, the kinetic tremor is recorded with a digitizing tablet during a simple task of continuously drawing circles along a given path. The quantification is performed by models that utilize two classes of parameters, specifically: (1) parameters computed directly from the acquired signals, based on the knowledge of disease symptoms, and (2) parameters computed from the activations of echo state networks (ESN) (a class of artificial neural networks (ANN)), collected after passing the acquired trajectories through a set of ESNs.
Teaching predictive models based on both of the aforementioned groups of parameters, being a multimodal approach, is a novelty in the field of the kinetic tremor diagnostics. The solutions found in literature are based exclusively on one of two types of parameters mentioned above, commonly with a disease diagnosis or a disease stage classification as an output [8][9][10][11][12][13][14].
As for the quantification of kinetic tremor, two of the cited studies are relevant. Lin et al. [12] presented a system for assessment of the tremor severity through a digital analysis of spiral drawings by PD and essential tremor (ET) patients. The quantification was based on signal-derived parameters (no processing with ANNs was performed), and predictions correlated well with visual ratings provided by physicians. In the second study, comparably good results were reported by Legrand et al. [8], where the authors recreated visual ratings of ET patients' tremors (according to the Bain and Findley rating scale) from seven independent raters, employing the analysis of spiral drawings in time and frequency domains.
An interesting example of parametrizing data from a digitizing tablet by features based on field-specific knowledge can be found in the study by Lopez-de-Ipna et al. [13] that was targeted on early ET diagnosis. The authors used linear and nonlinear features (the latter being complexity metrics, namely a Shannon entropy and a fractal dimension) of the drawings of spirals as an input for the classifiers such as support vector machine, multilayer perceptron neural network (MLP), and k-nearest neighbors algorithm. The achieved classification accuracies were reported to be the greatest for the MLP classifiers. It should be noted that this neural network has not been used as a source of signal parameterization, whereas in our approach it serves as such.
The two following studies showed applicability of ESN architecture for diagnosis of PD. Gallicchio et al. [11] presented an approach in which time series acquired during drawing of spirals were analyzed with deep ESNs without the signal preprocessing and feature extraction steps. Another attempt of employing ESNs to analyze and classify minimally processed signals may be found in a study by Lacy et al. [10] which addressed the problem of differentiating PD patients and control subjects. These are examples of ANN usage in which the network activations are closely related to the raw movement data, which is in line with our approach. However, ESNs served as classifiers here, while the tremor quantification is of our main interest.
The aim of our study is to look for a synergic effect in a combination of (1) a classical signal processing and (2) a signal exploration with an artificial neural network, in the task of the quantification of kinetic tremor. Additionally, only the models that provide continuous tremor scoring are considered, as any discretization of response may potentially hinder monitoring of the disease progression in its early stage.

Measurement System
The measurement system consists of a digitizing tablet (Intuos 2, XD-0608-U, Wacom, Vancouver, WA, USA; resolution: 0.01 mm, sampling: 100 samples/s, accuracy: ±0.25 mm, levels of pressure: 1024, upper cut-off frequency (−3 dB): 13 Hz-derived from the tablet frequency response which was measured by drawing circles along the diagonal of the active area with frequencies ranging from 1 to 25 Hz, with a motorized, rotating pen holder) connected to a personal computer running custom acquisition software [15]. Two-dimensional pen trajectory and pen pressure time series were recorded during the measurement. The acquired data were analyzed in LabVIEW 2014 (National Instruments, Austin, TX, USA) and R environment (R Base ver. 3.4.4, Vienna, Austria) [16].
During the examination, a patient traced along the given template of a circle (130 mm in diameter), centered over the tablet active area. There was no preferred direction of drawing and the drawing speed was roughly specified as "moderate". Each examination took approximately 100 s. The patient performed the task while seated and with their elbow of the examined arm raised from the table and unsupported. The examination was carried out for both hands.
Exemplary signals acquired from the patient having severe tremor (rated 68 points in UPDRS part III-motor examination) are presented in Figure 1 along with respective power spectral densities (estimated with the Welch's method; parameters: 1024 frequency bins, Hann window of 1000 samples, 80% overlap). A spectral peak related to the patient's tremor is seen between 4 and 5 Hz for each signal. The large-amplitude artifacts, present between 25 and 50 s of the pen pressure time series, arise due to accidentally drawing outside the tablet active area.  [16]. During the examination, a patient traced along the given template of a circle (130 mm in diameter), centered over the tablet active area. There was no preferred direction of drawing and the drawing speed was roughly specified as "moderate". Each examination took approximately 100 s. The patient performed the task while seated and with their elbow of the examined arm raised from the table and unsupported. The examination was carried out for both hands.
Exemplary signals acquired from the patient having severe tremor (rated 68 points in UPDRS part III-motor examination) are presented in Figure 1 along with respective power spectral densities (estimated with the Welch's method; parameters: 1024 frequency bins, Hann window of 1000 samples, 80% overlap). A spectral peak related to the patient's tremor is seen between 4 and 5 Hz for each signal. The large-amplitude artifacts, present between 25 and 50 s of the pen pressure time series, arise due to accidentally drawing outside the tablet active area.

Data
The measurements were taken from 64 PD patients, of which 30 were females (mean age ± standard deviation (SD): 55.8 ± 10.1) and 34 were males (mean age ± SD: 57.9 ± 10.4). The patients underwent a surgical procedure (implantation of deep brain stimulation) and/or were treated with pharmacotherapy. In this study, however, the type of treatment was not considered as a differentiating factor.
The examinations were carried out in the Department of Neurology of the Medical University of Warsaw (research was under approval of the Bioethics Committee of the Medical University of Warsaw, all subjects gave written informed consent in accordance with the Declaration of Helsinki).

Data
The measurements were taken from 64 PD patients, of which 30 were females (mean age ± standard deviation (SD): 55.8 ± 10.1) and 34 were males (mean age ± SD: 57.9 ± 10.4). The patients underwent a surgical procedure (implantation of deep brain stimulation) and/or were treated with pharmacotherapy. In this study, however, the type of treatment was not considered as a differentiating factor.
The examinations were carried out in the Department of Neurology of the Medical University of Warsaw (research was under approval of the Bioethics Committee of the Medical University of Warsaw, all subjects gave written informed consent in accordance with the Declaration of Helsinki). The patients were repeatedly examined by trained clinicians during subsequent appointments, which resulted in 1100 recorded drawings accompanied by UPDRS ratings. The item 21 from the UPDRS scale (UPDRS.21) was of special interest for this study, as it is meant for scoring of the kinetic or postural tremor of hands.
The kinetic tremor in each recording was rated with a proposed scale, named tremor_score. The tremor_score is based on the visual analysis of a 2D trajectory and a power spectrum of the recorded drawing. The score is assigned from 0 ("no tremor") to 10 ("severe tremor") on an integer, monotonic scale. To improve consistency of ratings, these should be performed in uninterrupted sessions with subsets of data as large as possible. The rating person does not accompany patients during their examinations, nor do they have any additional information about them, such as duration of the disease or assigned UPDRS scores.
The tremor_score is designed as a modification of the UPDRS.21, with the following differences: only the kinetic tremor is considered (specifically, scoring of the postural tremor is avoided), the range of scores is extended and assumed as strictly monotonic (the UPDRS.21 scores range from '0' to '4' and scores '2' and '3' differ only in the presence of the postural tremor, while both correspond to the kinetic tremor of "moderate" amplitude; as such, the UPDRS.21 scores are weakly monotonic in relation to the kinetic tremor), and, finally, the rating person has insight in the power spectrum of the signal in order to support scoring of small-intensity tremors.
We note that the tremor_score differs from the widely adopted scale of tremor severity by Bain and Findley [17] in that it: (1) encompasses visual analysis of the signal power spectrum and (2) enables the rating person to subset the recording and to plot the original trajectory only partially. We consider the latter especially important, as in the case of circle-drawing task, the circles drawn by a patient in subsequent repetitions overlay and intersect, which can obscure manifestations of tremor.

Spectrum-Based Tremor Quantification
Occurrence of a tremor component in a signal manifests itself in a form of prominent peaks in its power spectrum. A method of quantification of such a tremor manifestation is proposed, yielding a single parameter tQ, being a ratio of a tremor-related signal power and an averaged background signal power. The tQ parameter is obtained from the traced trajectory (stored in the form of time series of Cartesian coordinates {x(t), y(t)}) in the following steps: 1.
The acquired time series {x(t), y(t)} are high-pass filtered with 0.5 Hz cut-off frequency and transformed with short-time Fourier transform (STFT) configured as follows: Hann window of 512 samples, 5-sample step (0.05 s), 256 frequency bins.

2.
Combined spectrogram STFTc( f , t) is computed as a sum of squared STFTs of x(t) and y(t) according to Equation (1): 3.
The STFTc( f , t) spectrogram is divided into 2-s-long segments with 0.05 s step. For each segment of the spectrogram, a relevant power spectral density (PSD) denoted by We have observed that a common feature of the obtained S i ( f ) is the presence of a relatively large spectral component with the power density reciprocally related to the frequency. It can be written in the form S Ri ( f ) ∝ f −α , where α takes values greater than 0. S Ri ( f ) may be attributed to the spectral leakage from a large amplitude and low frequency signal of circle tracing, an artefact of the filtering from step 1. This component should be removed because it hinders the detection of tremor-related peaks. To remove S Ri ( f ) from S i ( f ), a function of the form S Ni ( f ) = b i f −α i is fitted to the spectrum in the range 2-26 Hz (fitting is performed in logarithmic coordinates, using the method of least absolute residuals, as this method shows insensitivity to large outliers [18], here being the spectral peaks). The S i ( f ) is divided by the fitted function S Ni ( f ), yielding the residual spectral density S Ti ( f ).

5.
S Ti ( f ) is divided into 26 frequency bins that range from 2 to 15 Hz, each having the width of 0.5 Hz. For every bin, the signal variance (power) is computed, denoted by σ 2 i,k for k-th bin. The mean frequency of the bin with the greatest signal variance is denoted by f tQ,i . 6.
Variances related to the i-th data segment are sorted in ascending order, thus forming a 26-element set W i , where k-th variance is denoted by W i,k . The corresponding parameter tQ i is then computed according to Equation (2). Variances W i, 25 and W i, 24 are not averaged in the denominator of the equation. It is assumed that these may be related to W i,26 by conveying the power of signal harmonics, hence they should not be considered for background power estimation: (2)

Power Spectral Density and Signal Entropy
In addition to the computation of the tQ parameter, the power of the signal in the band above 3 Hz is computed. Hence, a parameter named PSDc_var_3Hz is obtained based on the total PSD, being a sum of partial PSDs of x(t) and y(t). The PSD is computed with the Welch's method (1024 frequency bins, Hann window of 1000 samples, 80% overlap).
As the analyzed signals represent activity of a complex system, we decided to include a measure of their complexity in the parameters set. We refer to the sample entropy (SampEn) [19], defined as the negative natural logarithm of the conditional probability that two distinct sequences of m samples that are similar (i.e., the distance between them is smaller than the threshold r, where the distance is measured with the Chebyshev metric) will remain similar after extending them with the next samples (i.e., to the length m+1). To analyze the signal in different time scales, a multiscale entropy (MSEn) method can be applied [20], in which the computation of the SampEn is preceded by averaging and decimation of the original signal {x 1 ,..., x i ,..., x N }. The samples of the resulting coarse-grained signal y (τ) are calculated, according to the equation [20]: where: τ is decimation (scale) coefficient, x i is i-th sample of the original signal, and N is the length of the original signal. The MSEn is calculated for the drawing velocity component (denoted by V t ) that is transverse to the reference circle. Long-term trends are removed from the signal during preprocessing when the signal is high-pass filtered over 0.05 Hz (a finite impulse response (FIR) filter with a Hann window of 2000 samples).
The set of 19 parameters derived from the time series was extended with log-transformations of the parameters that had the most skewed distributions in randomly sampled data subsets. The skewness was computed from 100 randomly sampled subsets, each consisting of 60% of the complete dataset, and the parameters which average skewness minus one standard deviation was greater than 3 were transformed with base 10 logarithm (4 parameters met this condition). The resulting 23 parameters are summarized in Table 1.

PSDc_var_3Hz PSDc_var_3Hz_log
Total power of the signal in the band above 3 Hz and its log 10 transformation f_tQ_mod Absolute difference of f_tQ_mod and 5 Hz, i.e., |f_tQ_mod-5|

avg_tQ_max avg_tQ_max_log
Average of tQ in the segments representing 10 s of measurement with the highest tQ values (i.e., 200 segments with highest tQ in the case of 0.05 s segment step) and its log 10 transformation sd_tQ_max sd_tQ_max_log Standard deviation of tQ in the segments as used in avg_tQ_max computation and its log 10 transformation

Echo State Network Architecture
An echo state network is a type of recurrent neural network (RNN), in a classical approach characterized by a randomly generated hidden layer with untrained connections, where only the output weights are subject to supervised training [21,22]. Such a hidden layer is called reservoir [23], and it is able to memorize the time series fed to the network. The reservoir should satisfy the echo state property, i.e., the state of the reservoir should be uniquely defined by the fading history of the input signal. Echo state property is ensured by the proper adjustment of the network hyperparameters, which are parameters governing the distribution of connection weights.
In this work, we assume the following update equation of the ESN with N neurons in the reservoir [22]: where: x(t) ∈ R N is a vector of reservoir activations at the time step t, u(t) ∈ R M×T is an M-dimensional input signal of length T, W in (t) ∈ R N×M+1 and W(t) ∈ R N×N are the input and reservoir weight matrices, respectively, [•; •] is the vertical concatenation operator, α ∈ (0, 1] is the leaking rate and tan h(•) is the element-wise hyperbolic tangent function.
The ESN output is defined as: where: y(t) ∈ R My is the M y -dimensional network output and W out ∈ R My×N+1 is the output weight matrix. A schematic illustration of the ESN architecture is presented in Figure 2.
The ESN output is defined as: where: ( ) ∈ ℝ is the My-dimensional network output and ∈ ℝ × is the output weight matrix.
A schematic illustration of the ESN architecture is presented in Figure 2. The reservoir matrix and input weights are initialized randomly from uniform distributions over [ −0.5; 0.5 ] and [ −0.5 ; 0.5 ] , respectively, where ω is an input scaling parameter. The matrix W is typically generated sparse, with sparsity (i.e., proportion of non-zero elements in matrix) denoted by sW.
An important hyperparameter of the network is a spectral radius ( ) of the reservoir, i.e., the maximum absolute eigenvalue of . When network is configured with leaky integration in nodes ( 1), the effective spectral radius has to be calculated according to the equation [24]: where ∈ ℝ × is an identity matrix. In most cases, setting < 1 is sufficient to ensure the echo state property. The spectral radius of matrix is adjusted in two steps: first, the matrix is elementwise divided by its current spectral radius, and, second, it is element-wise multiplied by the value of the desired spectral radius. The ESN shows increased processing capability in the state between ordered and chaotic dynamics, that is at the edge of chaos [25,26]. The stability of dynamical system may be estimated by its largest Lyapunov exponent, i.e., a measure of divergence (in the state space) of the system trajectories with infinitesimally small initial separation, as defined in Equation (7) [25]: where: is the initial distance between the two considered trajectories and is the distance at time k.
The value of < 0 is characteristic for stable systems, while > 0 indicates that the system shows chaotic behaviour. The phase transition occurs at the aforementioned edge of chaos for 0. As the largest Lyapunov exponent is defined asymptotically, it has to be estimated (we denote the estimated exponent by ). In this work, we use method described in [25], where it was adopted from the more generic approach found in [27]. Moreover, we propose and implement a complementary method of adjusting of a network according to the iterative procedure presented in Algorithm 1, founded on the assumption of The reservoir matrix W and input weights W in are initialized randomly from uniform distributions over [−0.5; 0.5] and [−0.5ω; 0.5ω], respectively, where ω is an input scaling parameter. The matrix W is typically generated sparse, with sparsity (i.e., proportion of non-zero elements in matrix) denoted by s W .
An important hyperparameter of the network is a spectral radius ρ(W) of the reservoir, i.e., the maximum absolute eigenvalue of W. When network is configured with leaky integration in nodes (α ≤ 1), the effective spectral radius has to be calculated according to the equation [24]: where I ∈ R N×N is an identity matrix. In most cases, setting ρ W < 1 is sufficient to ensure the echo state property. The spectral radius of matrix is adjusted in two steps: first, the matrix is element-wise divided by its current spectral radius, and, second, it is element-wise multiplied by the value of the desired spectral radius. The ESN shows increased processing capability in the state between ordered and chaotic dynamics, that is at the edge of chaos [25,26]. The stability of dynamical system may be estimated by its largest Lyapunov exponent, i.e., a measure of divergence (in the state space) of the system trajectories with infinitesimally small initial separation, as defined in Equation (7) [25]: where: γ 0 is the initial distance between the two considered trajectories and γ k is the distance at time k. The value of λ max < 0 is characteristic for stable systems, while λ max > 0 indicates that the system shows chaotic behaviour. The phase transition occurs at the aforementioned edge of chaos for λ max ≈ 0. As the largest Lyapunov exponent is defined asymptotically, it has to be estimated (we denote the estimated exponent byλ). In this work, we use method described in [25], where it was adopted from the more generic approach found in [27].
Moreover, we propose and implement a complementary method of adjustingλ of a network according to the iterative procedure presented in Algorithm 1, founded on the assumption of monotonic relation between ρ W andλ. The algorithm does not ensure thatλ is set with assumed tolerance ∆λ 0 , as the closest reached solution is returned. If the result of the adjustment procedure is not acceptable, one should reinitialize W matrix and rerun the procedure.  The recurrent structure of the ESN enables memorization of input sequences. The measure of network short-term memory is called memory capacity (MC) and is estimated through teaching the ESN to recover its past inputs. The MC is calculated according to [25], with uniformly random time series from the interval [−1; 1] used as an input.

ESN Input Preprocessing
Acquired trajectories {x(t), y(t)} were preprocessed before being used as an ESN input u(t), according to the following procedure:

1.
Components x(t) and y(t) were bandpass filtered (FIR filter with 1-12 Hz passband, Hann window of 1000 samples) and decimated with factor 2.

2.
Principal component analysis (PCA) was performed on the filtered data and the first component PC 1 was retained for further computations.

3.
To normalize the data, the PC 1 component was divided by its interquartile range (IQR).

Parameters Based on the ESN Activation
States of the ESN for i-th processed trajectory are combined as columns, yielding an activation matrix S i ∈ R N×T . Each activation matrix is summarized by a vector of parameters, for S i defined as follows: where: is the vector of time-averaged activations of neurons (operator [:, j] denotes extraction of the j-th column of a matrix) and is the vector of standard deviations of activations of neurons.
Vectors w i@+1 out R N and w i@−MC/2 out R N are the output weights of ESN trained for prediction (one step ahead, thus superscript "i@ + 1") and reconstruction of delayed signal (MC/2 steps back, thus superscript "i@ − MC/2"), respectively. In both cases, the output layer is trained using linear regression and the bias component is removed from the obtained vectors of weights. To embed the information about the original signal amplitude in the weights, these are calculated for the task of prediction/reconstruction of the unnormalized signal, that is, the component PC 1 is not divided by its IQR when used as a target during training (see Section 2.6).

Score Prediction Model
The parameters computed from the complete dataset are merged into matrices which are used to train and test models of the tremor_score and UPDRS.21 scales. Three matrices of parameters are considered: D TS -a matrix of 23 parameters retrieved from time series, D ESN -a matrix of 4N parameters based on ESN structure and activation, D ESN+TS -a row-wise combination of D TS , and D ESN matrices. Each matrix is split into two submatrices for training and testing of the models, which consist of 60% and 40% of the original matrix, respectively. The split is performed by stratified random sampling to keep proportion of scores from the complete dataset. For each scale, the following models are considered: M TS : y TS = g( f TS (D TS )), where: f ... (•) is a linear function of parameters, fitted to predict the score of interest from the parameters matrix D ... , y ... is a prediction of the score, g(•) is a limiting function as defined in (14), {y max , y min } are the maximum and minimum values of the modelled scales, that is, {10, 0} and {4, 0} for the tremor_score and UPDRS.21, respectively. The f ... (•) functions are fitted using the least absolute shrinkage and selection operator (LASSO) method [28]. The LASSO method performs variable selection and regularization, which is considered especially important in the case of high-dimensional training sets that include ESN-derived parameters. The LASSO output depends on a regularization parameter λ LASSO chosen with 10-time, 5-fold cross-validation. Two values of cross-validated λ LASSO are considered, that is, λ MIN and λ 1SE . The parameter λ MIN corresponds to the models with the minimal mean squared error (MSE) of prediction. The parameter λ 1SE corresponds to the models with the minimal number of non-zero coefficients and prediction MSE not greater than the minimal MSE computed during cross-validation, increased by its standard error [29].
Due to the random initialization of the ESNs, the models that are based on their activations (M ESN and M ESN+TS ) show increased variability of prediction quality when compared with the models that use only parameters computed from the time series (M TS ). To reduce the effect of initialization randomness on the prediction, an ensemble of models is established [30,31]. The ensemble is formed by combining ESN-based predictors (with hyperparameters as presented in Table 2), and the aggregated output is computed as an average of the ensemble models' predictions. In total, the ensemble consists of 216 individual models. To analyze the variability of the aggregated ensemble response, the models were recomputed 50 times, i.e., 50 independent and randomly initialized ensembles were formed. The diagram summarizing the process of training and testing models is presented in Figure 3.  , λMIN and λ1SE. The parameter λMIN corresponds to the models with the minimal mean squared error (MSE) of prediction. The parameter λ1SE corresponds to the models with the minimal number of non-zero coefficients and prediction MSE not greater than the minimal MSE computed during cross-validation, increased by its standard error [29]. Due to the random initialization of the ESNs, the models that are based on their activations ( and ) show increased variability of prediction quality when compared with the models that use only parameters computed from the time series ( ). To reduce the effect of initialization randomness on the prediction, an ensemble of models is established [30,31]. The ensemble is formed by combining ESN-based predictors (with hyperparameters as presented in Table  2), and the aggregated output is computed as an average of the ensemble models' predictions. In total, the ensemble consists of 216 individual models. To analyze the variability of the aggregated ensemble response, the models were recomputed 50 times, i.e., 50 independent and randomly initialized ensembles were formed. The diagram summarizing the process of training and testing models is presented in Figure 3.

Results
The Pearson's correlation coefficients between the signal-based parameters and the target scales are computed for the complete dataset and shown in Table 3. If correlations having absolute value greater than 0.1 are considered, then these are consistently higher for the tremor_score than for the UPDRS.21 scale. The log-transformed parameters PSDc_var_3Hz_log, avg_tQ_max_log and sd_tQ_max_log show higher absolute correlations than the respective source parameters, thus confirming the relevance of performing such a transformation. The highest absolute correlations are observed for avg_tQ_max_log (r = 0.73 for tremor_score and r = 0.52 for UPDRS.21), which supports the idea of computing tQ as a measure of tremor. The lowest absolute correlations are present for the parameters derived from the low-pass filtered V t . On the contrary, MSEn (τ ∈ {25, 50}) computed for the V t has one of the highest absolute correlations for both target scales. As a metric of prediction quality of the studied models, the correlations between their responses and the target scores are computed. Since the proposed models provide continuous outputs, the prediction quality is measured with Pearson's r correlation coefficient. The results for individual models are shown in Figure 4. The highest median of Pearson's r is achieved by models for the tremor_score and models for the UPDRS.21, in each case with λMIN regularization parameter. The models show greater variability of prediction quality when compared with models and models. Especially, the models show no (UPDRS.21) or a very limited number (tremor_score) of outliers, while numerous models have significantly lowered prediction quality (we define outliers as models with Pearson's r less than the lower quartile of the r distribution minus 1.5 times its interquartile range (IQR) or greater than the upper quartile plus 1.5 IQR).
The analysis of individual models provides information about general patterns of inclusion of the parameters from matrices DTS and DESN, with the latter partitioned in subsets of parameters , , @ and @ / . The results are presented in Figure 5. In the case of tremor_score, the λ1SE-regularized models show approximately equal contributions of DESN parameters' subsets, while in the λMIN-regularized models, the percentage of @ and @ / parameters is higher than and . Similar contributions of parameters may be observed in the case of models of the UPDRS.21 scale, which differ in having greater percentage of and parameters at the expense of decreased percentage of @ and @ / parameters. For both scales and with both types of regularization, models of type are dominated by DTS parameters, as their median count is greater than the respective median counts from the remaining groups of parameters. Inclusion of DTS parameters does not lead to a consistent change in the number of model parameters when compared with models. However, it reduces the spread of the parameter count (quantified by IQR), which is shown in Figure 6. The results for model ensembles and individual models are shown in Figure 7 and Table 4. The highest correlations of predictions and target scores are observed in the case of individual models. The maximum Pearson's r of models is equal to 0.638 (UPDRS.21, λMIN and λ1SE), 0.897 (tremor_score, λMIN) and 0.891 (tremor_score, λ1SE). Adoption of the ensemble approach leads to reduction of the maximum prediction quality in the case of best ensembles as compared with the best individual models (maximum values: r = 0.626 for UPDRS.21 (λMIN and λ1SE), r = 0.890 for tremor_score (λMIN) and r = 0.884 for tremor_score (λ1SE)). Nevertheless, it does essentially improve outcome in the worst-case scenario, as the ensembles outperform models of other The results for model ensembles and individual models are shown in Figure 7 and Table 4. The highest correlations of predictions and target scores are observed in the case of individual M ESN+TS models. The maximum Pearson's r of M ESN+TS models is equal to 0.638 (UPDRS.21, λ MIN and λ 1SE ), 0.897 (tremor_score, λ MIN ) and 0.891 (tremor_score, λ 1SE ). Adoption of the ensemble approach leads to reduction of the maximum prediction quality in the case of best M ESN+TS ensembles as compared with the best individual M ESN+TS models (maximum values: r = 0.626 for UPDRS.21 (λ MIN and λ 1SE ), r = 0.890 for tremor_score (λ MIN ) and r = 0.884 for tremor_score (λ 1SE )). Nevertheless, it does essentially improve outcome in the worst-case scenario, as the M ESN+TS ensembles outperform models of other types with respect to the minimum prediction quality (minimum values: r = 0.530 for UPDRS.21 (λ MIN ), r = 0.527 for UPDRS.21 (λ 1SE ), r = 0.839 for tremor_score (λ MIN ) and r = 0.827 for tremor_score (λ 1SE )). The results for model ensembles and individual models are shown in Figure 7 and Table 4. The highest correlations of predictions and target scores are observed in the case of individual models. The maximum Pearson's r of models is equal to 0.638 (UPDRS.21, λMIN and λ1SE), 0.897 (tremor_score, λMIN) and 0.891 (tremor_score, λ1SE). Adoption of the ensemble approach leads to reduction of the maximum prediction quality in the case of best ensembles as compared with the best individual models (maximum values: r = 0.626 for UPDRS.21 (λMIN and λ1SE), r = 0.890 for tremor_score (λMIN) and r = 0.884 for tremor_score (λ1SE)). Nevertheless, it does essentially improve outcome in the worst-case scenario, as the ensembles outperform models of other types with respect to the minimum prediction quality (minimum values: r = 0.530 for UPDRS.21 (λMIN), r = 0.527 for UPDRS.21 (λ1SE), r = 0.839 for tremor_score (λMIN) and r = 0.827 for tremor_score (λ1SE)).

Discussion
Automated quantification of the kinetic tremor with the presented multimodal approach provides best results in the case of predictive models based on the tremor_score scale. These models achieve maximum correlation between the predictions and original scores equal to 0.897 in the test set. Results of the UPDRS.21 modelling are inferior, as the highest Pearson's r for this scale is equal to 0.639.
Such a difference between outcomes for the two analysed scales originates from differences in their design. Whereas the tremor_score is dedicated solely for the quantification of kinetic tremor, the UPDRS.21 scale combines additional information about the postural tremor. During the circle-drawing task, the postural tremor may be recorded only if it propagates through the patient's body from different body segments. However, possibility of the postural tremor emergence during the examination is minimized, as the patient is seated and all of their not supported body parts, excluding a head, are involved in the drawing action. Therefore, while the presence of postural tremor during the UPDRS.21 examination changes the assigned score, it does not manifest itself in the recorded data and consequently increases the prediction error. Moreover, even observation of kinetic tremor during the aforementioned examination does not imply that a tremor with the same characteristics will be present in the course of circle-drawing task, as the kinetic tremor is an intermittent symptom.
Parameters computed directly from the time series have a significant impact on the quality of prediction. The M TS models, which are based exclusively on the D TS parameter set, show the lowest variance of prediction quality from all three types of individual models. Comparable variances are achieved with ensemble models, however at the expense of considerable growth of the parameter set. The rationale of this result is that the data-derived parameters are chosen with some field-specific knowledge about the quantified phenomenon, while the network-derived parameters are related to this phenomenon only indirectly, and as such have to be introduced to the model in greater numbers to compensate for their potentially weak predictive power.
We compare our results related to the tremor_score models to the results found in recent studies aimed at evaluation of the kinetic tremor with a digitizing tablet, where (1) a type of visual rating score (VRS) of spiral drawings was used as a reference and (2) proposed models had a linear output suitable for evaluation of its correlation with VRS.
We refer to two studies, the first being the work by Lin et al. [12], where the authors reported Pearson's correlations between the VRS and the investigated signal-derived parameters to be as high as 0.973. The models of VRS were trained and validated with digitized spiral drawings from ET and PD patients. Specifically, the training and test datasets consisted of approximately 140 and 30 recordings, respectively. In the second work, Legrand et al. [8] computed four time-and frequency-domain parameters using 92 spiral drawings acquired from 13 ET patients (the spirals were drawn with the dominant hand). The complete dataset was used for validation, as no model training was necessary. The resulting Pearson's correlations of parameters and VRS were between 0.79 and 0.87. Nevertheless, the authors note that from the proposed parameters that only the one showing r = 0.82 is suitable for analysis of drawings other than spirals.
As the best tremor_score model ensembles proposed in our study were characterized by the Pearson's r between 0.839 and 0.890, we state that our approach provides performance similar to these found in the referenced works. However, direct comparison of results is not appropriate, as there are some notable differences in the referenced methods, namely: (1) combining the data from ET and PD patients, (2) employing a spiral-drawing task and (3) being tested on significantly smaller datasets.
There is a limited number of works regarding modelling of the UPDRS.21 score based on the PD patients' digitized drawings. In the most relevant study, Saunders-Pullman et al. [32] selected spiral-related parameters showed Spearman's correlations between 0.40 and 0.24 with the combined UPDRS.21 score (a sum of scores of both arms). In addition, 100% of the dataset (74 PD patients, 10 spirals collected for each hand) was used to fit linear models and compute correlations. As in the case of tremor_score, we refrain from performing strict comparison of the aforementioned results with our findings, as the former were achieved for a spiral-drawing task and the Spearman's rather than Pearson's correlations were computed as a performance metric. However, we hypothesize that these linear models may be outperformed by the best of the UPDRS.21 model ensembles from our work, for which we have computed Pearson's r ranging from 0.530-0.626.
It is an open research question whether our approach would provide results of a similar quality if the data from ET patients were to be quantified. We recognize this generalizability issue as very important in the context of targeting the kinetic tremor, as this symptom is widespread in ET. Combining data from PD and ET patients would allow for better comparison with, e.g., work by Lin et al. [12]. However, one should be aware that, during the model fitting, such a combination may result in discarding parameters that are specific for only one of the diseases, yielding models that are more general, albeit with inferior prediction quality when tested solely in PD or ET populations.
Future research should include testing of the repeatability of the scores obtained with the proposed models. It would be beneficial to analyze how the predictions change when the models are trained with ratings from different raters. Moreover, before our method can be applied in its target environment in the form of clinically-usable models, it is necessary to prepare a refined dataset for their training. The main difficulty in establishing such a dataset is to ensure balanced inclusion of the samples representing tremors of different intensities, as in the PD patients the most severe tremors are less frequent than those of moderate severity.
Some of the possible applications of our method are in the following two areas of PD management: (1) tremor scoring when a patient is far from a medical facility and (2) providing additional feedback for a clinician who performs adjustments of deep brain stimulation (DBS) parameters. Tremor scoring in the absence of a trained clinician can be a useful tool supporting medical personnel in the case of teleconsultations. It also enables PD patients to supplement their self-reports with objective ratings of the kinetic tremor, e.g., allowing patients to record it exactly when it appears to be most prominent. As for the second possibility, increasing repeatability of the tremor assessment during the DBS programming is a factor supporting automation of this time-consuming process. The tremors of PD respond quickly to the changes in stimulation settings [33]; therefore, their ratings provide information relevant for tuning the DBS parameters.

Conclusions
The method presented in this study is based on a widely adopted idea of analysing kinetic tremor recorded with a digitizing tablet during tasks involving drawing or writing. Our main contribution is showing that the kinetic tremor can be effectively quantified with a method, which benefits from merging two modes of data parametrization: first, the parametrization derived from the expert knowledge, and, second, the parametrization that refers to the structure of a trained echo state network and its activation in response to the data. The verification performed with clinical data showed that the models using both types of the aforementioned parametrizations outperform the ones based only on a single class of the parameters. Moreover, we combined such individual models in model ensembles. The Pearson's correlations between the visual ratings of tremor and the model predictions in the best-performing model ensembles ranged from 0.839 to 0.890. The individual models from these ensembles showed more varied correlations between 0.737 and 0.897.