Advanced Statistical Analysis of 3D Kinect Data: A Comparison of the Classiﬁcation Methods

: This paper focuses on the statistical analysis of mimetic muscle rehabilitation after head and neck surgery causing facial paresis in patients after head and neck surgery. Our work deals with an evaluation problem of mimetic muscle rehabilitation that is observed by a Kinect stereo-vision camera. After a specific brain surgery, patients are often affected by face palsy, and rehabilitation to renew mimetic muscle innervation takes several months. It is important to be able to observe the rehabilitation process in an objective way. The most commonly used House–Brackmann (HB) scale is based on the clinician’s subjective opinion. This paper compares different methods of supervised learning classification that should be independent of the clinician’s opinion. We compare a parametric model (based on logistic regression), non-parametric model (based on random forests), and neural networks. The classification problem that we have studied combines a limited dataset (it contains only 122 measurements of 93 patients) of complex observations (each measurement consists of a collection of time curves) with an ordinal response variable. To balance the frequencies of the considered classes in our data set, we reclassified the samples from HB4 to HB3 and HB5 to HB6—it means that only four HB grades are used for classification algorithm. The parametric statistical model was found to be the most suitable thanks to its stability, tractability, and reasonable performance in terms of both accuracy and precision.


Introduction
Image analysis in medicine has been a popular topic in recent decades, and it is still evolving and finding further applications. Imaging techniques such as X-rays, computed tomography, and magnetic resonance imaging have been known and used for many years. However, new imaging methods have recently been developed (e.g., depth maps, 3D reconstructions, etc.) and their use in modern medicine is breathtaking.
Thus, a new area of image analysis in medicine is beginning, which is expected to be useful for diagnosis, rehabilitation, or validation of the results. This area does not aim to replace clinicians (this would not even be possible), but it does aim to save their time and help to interpret the results. A number of specific applications for biomedical image analysis can be found in the literature; for example, image analysis can be used to diagnose a large number of neurological diseases [1,2] or sleep apnea [3,4]. Furthermore, movement analysis has been used in rehabilitation [5].
It is possible to find a number of innovative diagnostic procedures based on image analysis. A research group based in Oklahoma University introduced a radiomics-based machine learning model to predict peritoneal metastasis in gastric cancer patients using a small and imbalanced computed tomography (CT) image dataset [6]. Meanwhile, 3D reconstruction also plays an important role in precise onco-surgery [7]. This paper is focused on 3D reconstruction and statistical analysis of mimetic muscle rehabilitation after head and neck surgery has caused facial paresis. The muscles of the face include all mimetic muscles innervated by the cranial nerve VII (facial nerve). Within the parotid gland, the facial nerve terminates by bifurcating into five motor branches. These innervate the muscles: temporal branch-innervates the frontalis, orbicularis oculi, and corrugator supercilii; zygomatic branch-innervates the orbicularis oculi; buccal branch-innervates the orbicularis oris, buccinator and zygomaticus muscles; marginal mandibular branch-innervates the mentalis muscle; cervical branch-innervates the platysma. Two masticatory muscles (masseter, temporalis) that are supplied by the motoric portion of the cranial nerve V3 (mandibular nerve) also contribute to the contour of the face [8].
In our analysis, we treat the measurements (i.e., trajectories of selected points on a face) as functional data. Functional data analysis has been applied in many diverse areas, including (bio)medical data. Let us recall the earlier review paper [9] and a more recent paper [10], to name just a few. Besides the parametric model, we also applied a nonparametric approach that is based on (ordinal) random forests. A very recent application of random forests for clinical risk prediction based on complex data can be found in [11].
Clinicians require an objective, reliable, and valid clinical tool to accurately describe a patient's facial function, to monitor status over time, and to assess the course of recovery and the effects of treatment. For a grading system to have clinical usefulness, it must be easy to administer, require little time or equipment, and be sensitive enough to detect clinically important changes [12].
This work builds on previous publications and the work of several research groups dealing with similar topics [13].

Biomedical Background
Facial nerve palsy has a significant physical and emotional toll on affected individuals. A thorough history and physical examination are needed to narrow the broad differential diagnosis and to determine an appropriate management plan [14].
We distinguish two major types of facial nerve palsy: central (upper) motoneuron lesion between cortex and nuclei of the facial nerve in the brainstem and peripheral (lower) motoneuron lesion between nuclei in the brainstem and peripheral organs. The most common cause of lover motoneuron lesion is idiopathic facial nerve palsy, also known as Bell's palsy. Closely following Bell's palsy are infection and inflammation. Trauma, including surgical trauma in head and neck surgery, is the third most common cause of facial nerve paralysis in the general population. Other important etiologies of facial nerve dysfunction include herpes zoster oticus and neoplasms of the parotid gland, brain, and the petrous part of the temporal bone [15].
Cases of complete paralysis after surgery in which the onset of paralysis is indeterminate should be treated as immediate in nature. Delayed paralysis or incomplete paresis should be treated medically, with high-dose steroids. A good prognosis should be anticipated in these cases [16].

Electrodiagnostics of Facial Nerve Palsy
Electrophysiological tests are mainly used to determine the severity and prognosis of a peripheral facial nerve lesion [17].
Electroneurography objectively records the amplitude of electrically evoked muscle action potentials. It analyses the evoked compound muscle action potential (CMAP) of a specific facial muscle after transcutaneous stimulation of the main trunk of the facial nerve [18]. The main trunk is stimulated supramaximally at its exit from the stylomastoid foramen with a bipolar stimulator or stimulating electrodes. The CMAP is recorded using a bipolar pair of surface electrodes placed on the target muscle. Important test between 72 h and 21 days after onset, interpretation of result in comparison to needle electromyography (nEMG) result [19]. Nerve injury is expressed as percentage of function relative to the normal side [16].
Electromyography (EMG) measures volutional responses of the facial muscles without electrostimulation. A facial motoneuron unit consists of a facial motoneuron and all muscle fibers innervated by this motoneuron. Needle EMG (nEMG) is the method used to analyse a facial motor unit action potential (MUAP) recorded from a needle electrode inserted in the facial muscle. This examination is important 2-3 weeks after onset of the palsy, because pathologic activity can occur in case of facial nerve degeneration. In the later time course, nEMG is important to detect reinnervation potentials as signs of reinnervation of the facial muscles [18]. Surface electromyography (sEMG) works with voluntary activity of the facial muscles and not with external stimulation. The recording field is more superficial and larger than when using nEMG. sEMG is not used for prognostication. Multichannel sEMG is important if the interplay of different facial muscles should be analysed.
Blink-reflex testing is a test that allows stimulation of the facial nerve proximal to the lesion site. Testing is electrostimulation of the supraorbital branch of the trigeminal nerve (V1) and simultaneous sEMG recording from the orbicularis oculi muscle on both sides. Standard blink testing involves electrical stimulation of the supraorbital nerve on the affected side combined with a 2-channel simultaneous sEMG recording from both orbicularis oculi muscles. The exit of the supraorbital nerve in the supraorbital foramen is palpated on the rim of the orbit [20]. It may be most helpful if facial nerve damage is suspected to occur within the brainstem [18].

Facial Rehabilitation
Timely intervention is needed to provide patients with the best chance for recovery of facial nerve function [14].
The pharmacological treatment depends on the cause of facial nerve paresis and generally most often are used corticosteroids, antiviral agents, calcium channel blockers, vitamins to support regeneration of the nerve. Neuromuscular electrostimulation therapy is used for the direct or indirect therapeutic stimulation of nerves, muscles. In general, three stimulation frequency ranges can be distinguished: low-frequency, medium-frequency, and high-frequency currents. The lowfrequency stimulation currents (up to about 1000 Hz) are suitable for creating synchronous muscle contractions. Stimulation can be sensory or motor, but it should always be below the discomfort threshold. The current is applied via electrodes. These can either be inserted into the tissue surrounding a nerve, or into the muscle (percutaneous stimulation) or into the skin applied (transcutaneous stimulation) [21].
One of the most important parts of treatment is facial rehabilitation, which typically includes five main components, as follows:

1.
Patient education to explain the pathologic condition and set realistic goals; 2.
Soft tissue mobilisation to address facial muscle tightness and edema; 3.
Functional retraining to improve oral competence; 4.
Synkinesis refers to the abnormal, unwanted, involuntary facial movement that occurs coupled with purposeful facial movement. For example, oro-ocular synkinesis occurs when movement of the lips results in a closure of the eyelids. Mild forms of synkinesis may go undetected, but severe forms cannot be ignored because they may cause severe facial pain and facial tightness. The treatment of facial synkinesis is one of the most challenging aspects of facial paralysis care [14].

Evaluation of Facial Nerve Function
For clinicians, it is essential to evaluate the function of the facial nerve objectively. In everyday practice, the most commonly used tool is the standard grading system, such as the House-Brackmann scale. This system involves a six-point scale (with I being normal and VI total) of flaccid paralysis (Table 1). Group I represents normal facial movement with no weakness or synkinesis. A patient placed in group II has only slight asymmetry of facial movements with a possible slight synkinesis. Patients in group III have an obvious asymmetry with obvious secondary defects but some forehead movement. The presence of forehead movement indicates that there has not been total degeneration of the nerve. Patients in group IV have an obvious asymmetry, no forehead movement, and weakness with possible disfiguring synkinesis or mass action. When there is only slight movement of the face, no forehead movement, and not enough facial function to return to have secondary defects, the patient is placed into group V. The absence of any movement or tone places the patient in group VI [23]. The House-Brackmann scale produces comparable results between different observers in patients with normal or only mildly impaired facial nerve function. Interobserver variability can increase depending on the severity of facial nerve paresis. However, House-Brackmann does not promote uniformity of reporting and comparison of outcomes in patients with moderate or severe facial nerve paresis [24].

Materials and Methods
Our study included 93 patients, with 122 undergoing head and neck surgical procedures with the specific risk of postoperative facial nerve dysfunction. The data were collected from 3.2.2019 to 14.5.2020.
Every patient was measured over a defined schedule of check-ups (the first one before the surgery and then repetitively based on a defined schedule). The patient is asked to perform a series of exercises using the mimetic muscles during the examination, and the clinician evaluates this exercise numerically. The main disadvantage of this method is that these measurements are strongly subjective. Figure 1 shows scheme of the whole process from data acquisition to model development.

Data Acquisition
A mobile robotic system and a special software developed at UCT Prague were used for data acquisition Figure 2. This system can operate in a static mode for sensing mimetics (this issue) or in a dynamic mode for sensing gait and body movements (another project solved at UCT Prague).
Kinect v2 and RealSense are used for image acquisition. The final version of this robot uses only RealSense D435 manufactured by Intel. The Kinect sensor had to be replaced because Microsoft discontinued production in 2018 and subsequently support ceased at that time without the possibility of replacement.
The robot has undergone several versions during its development over the past months, see Figure 2. Several control loops are located in the robot, which control the robot in the correct direction and distance from the patient. The basic loop is located directly in the controllers of both axles and it regulates the speed of rotation of individual wheels, which it obtains from the superior processor. The master processor then processes the odometric data from both axles and runs another loop in it, which controls the total distance travelled by all wheels based on a comparison with the theoretical position obtained from the robot model.
The robot includes adaptive cruise control, which responds to changes in the distance of the patient from the robot and to the desired position and direction of the robot in the recording corridor. The direction and position in the corridor is obtained from data from the lidar, the distance of the patient from the robot is obtained from both the lidar and the stereo camera. Consequently, in the case of failure of one of the sensors, it is possible to control the system without interruption.
During the recording, the robot stores all of the data that could be needed for a retrospective analysis of both the patient's gait or mimics and any robot failure. Currently, the system can store all of the operating data, as follows: voltage on individual cells of both LiPol batteries, odometry of each wheel, raw data from lidar, stereoscopic recording of the patient's gait, and video recording of gait.

Data Preprocessing
The whole process of data preprocessing is thoroughly described in a recent paper [13]. To make this paper self-contained, let us briefly review this process here.
While being recorded by Kinect, the patients conduct their exercises at different times and at different speeds. Consequently, a temporal registration of the data is necessary for further statistical analysis. For that purpose, the original curve P i (t) representing movement of any facial point for measurement (patient) i is transformed using a time warping function w i in such a way that curves P i are aligned for all i.
The warping functions w i are computed by a landmark registration technique, where the beginnings and end of each repetition are identified and a piecewise cubic interpolation is used for the warping function. A more detailed description of the used registration technique can be found in [13].
In the clinician's House-Brackmann evaluation, classes HB4 and HB5 are seldom used compared to others due to their similarity with HB3 and HB6, respectively. To balance the frequencies of the considered classes, we reclassified the samples from HB4 to HB3 and HB5 to HB6.
Some exercises exhibited insufficient range of motion detectable by Kinect. In the following analysis, we consider these exercises: raising eyebrows, frowning, smiling, baring teeth, and pursing lips.
To reduce the amount of data obtained by Kinect and to access only the important behaviour, we chose several points of interest (POI) for the selected exercises and we computed various indicators from the mutual position of the selected POI at each time instant. The resulting indicator curves measure symmetry (movements in the left-half versus movements in the right-half of the face), intensity (the range of motion during the exercise), and speed (how fast the selected exercise is performed, measured by the warping function). The exact definition of all computed indicators is described in [13].

Compared Statistics
On one hand, we deal with the classification problem with very complex explanatory data (collection of curves for each observation) having unclear functional link to the response variable, which suggests using non-parametric statistics (or neural networks) as a preferred option. On the other hand, the number of observations is very limited, which increases the risk of overfitting for these methods. Therefore, we decided to compare parametric, non-parametric statistics and neural networks, which is less prone to overfitting, but requires very careful model specification to get reasonable results.

Parametric Statistical Model
Detailed description of the parametric statistical model for HB classification can be found in [13]. Let us recall that within this approach, the curves of the indicators are analysed by two parametric statistical models in two steps: • The first step consists of application of functional logistic regression (FLR) to indicator curves (understood as functional data) separately for each indicator type (exercise). This turns the indicator curves (functional data) into health scores (real-valued data between 0 and 1). • In the second step, classification of the set of health scores for each patient (multivariate data) into HB grades is performed using multivariate ordinal logistic regression (OLR).

Functional Logistic Regression
Logistic regression is a popular model that comes from the methodology of generalised linear models (GLM), which is characterised by Bernoulli-type error distribution (binary output variable Y) and logit link function. Functional logistic regression is a generalisation of this methodology to functional input data (functional covariates). A functional datum (a curve) is linearly transformed into real-valued linear predictor (by taking scalar product in L 2 space). The linear predictor is then turned into probability of a positive outcome P(Y = 1) by application of the inverse link function. The corresponding formula is In our setting, X k represents the curve of the particular indicator in the k-th sample (a functional covariate) and p k is the predicted probability of the k-th sample being healthy (i.e., having HB 1). We interpret these p k as health scores ("rate of healthiness") for a specific indicator and we then use them in the next step as covariates for HB classification. The parameters of the model include the scalar intercept α and the functional parameter β, both are estimated from the data by maximum likelihood approach.
More details about the functional approach can be found in [25]. For our calculations, we made use of an implementation of FLR in the statistical software package R (function classif.glm within the fda.usc package), see [26] for more details.

Multivariate Ordinal Logistic Regression
Given that we face the problem of classification with multiple, ordered, classes (HB grades) on the basis of the vector of health scores, we have chosen the apply the multivariate Ordinal Logistic Regression (OLR). This is a parametric statistical tool that is derived from the classical logistic regression. Whereas the classical logistic regression requires binary explanatory variable, OLR makes it possible to model ordinal explanatory variable (multiple classes with ordering). In particular, OLR applies a series of classical logistic regressions to cumulative probabilities to estimate the ordinal version of cumulative distribution function of the explanatory variable. Probabilities of individual classes (values of explanatory variable) are then predicted by taking the differences of the cumulative probabilities.
In our framework, the explanatory variable represents HB evaluation with ordered HB grades: HB1 < HB2 < HB3 < HB6. Regression models for cumulative probabilities take the form: The output P(HB k ≤ j) is the probability that the k-th sample has HB grade j or better, α j is the (HB specific) intercept parameter, β i is the coefficient for the health score of the i-th indicator (independent of the HB level) and p ik is the health score of the i-th indicator and the k-th sample. The parameters α j and β i are estimated from the data by maximum likelihood method.
To avoid overfitting and to provide stable model predictions, we performed a stepwise variable selection procedure based on minimising the AIC (considering both directions, starting from the empty model).
To compensate for the imbalance of the input data, we applied weighted version of multivariate OLR, where the weights of samples were set as the inverse value of the frequency of the corresponding class.
The probability of a sample's having a specific HB grade is calculated as the difference of cumulative probabilities: The resulting predicted class for a sample is determined as the class with the highest probability.
More details of this method can be found in [27] (which are referred to there as a cumulative logit model). We performed our calculations of OLR using the polr function from the MASS package in R (see [28] for further details).

Non-Parametric Statistical Model
In this subsection, we describe the methodology of non-parametric statistical model for our problem. Non-parametric models are based on different philosophy compared to the parametric approach that was described earlier. Instead of specifying the types of regression functions a prior and calibrating their parameters from the data, the form of the functional dependence of the response variables on the covariates is constructed directly on the data. In general, this approach is typically less sensitive to model misspecification (compared to parametric approach) but requires more data to provide stable prediction. This is the reason why we have chosen this approach as an alternative for comparison of the parametric approach.
Similarly to the parametric approach, we split the modeling process into two steps: • In the first step, we apply kernel functional classification to turn curves of individual indicators into real-valued health scores. • In the second step, HB grades are predicted from the lists (vectors) of health scores using the ordinal forests method.
Splitting the modeling process into these two steps is advantageous for potential further use by practitioners because it makes the whole process more tractable and it enables the practitioners to see the performance of the patient in each exercise, which underlies the final HB classification; see also [13].

Kernel Functional Classification
Kernel classification is a very popular technique for non-parametric supervised classification because of its flexibility: it generalises the concept of k-nearest neighbour. The functional version of the kernel classification is discussed in detail in the well-known and frequently cited book [29]. The main idea is to determine posterior probabilities of the groups for a new observation (new curve) according to its proximities to curves in the training dataset with a known group. The proximities are calculated via a kernel (denote K) is applied to a distance between curves (d) with respect to a bandwidth (h).
Although functional kernel classification can be used for the multiple classes problem, we use it for prediction of a probability of class HB 1 (the health score), for the reasons described earlier. In particular, we have where p N is the estimated posterior probability that new observation (new indicator curve) X N has HB1, X i for i = 1, . . . n are indicator curves in training set with known HB grade denoted as HB i , the indicator function 1 [HB i =1] equals one if HB i = 1 and is zero otherwise, K is normal (Gaussian) kernel, h is the bandwidth optimised in order to minimise loss function (misclassification rate) on the training set and d is the L 2 distance defined as We calculate the health scores p N for each indicator (exercise) and both for observations in the training set (for the purpose of calibration of ordinal forests in next step) and in the testing set (for the purpose of final classification by the calibrated ordinal forests).
Our calculations were implemented in the statistical software R, while kernel classification was done via the function classif.kernel within the fda.usc package); see [26] for more details.

Ordinal Forests
The ordinal forest is an efficient non-parametric tool based on random forests (RF), which is tailored to construct a classification rule for problems with multivariate input data (both low-dimensional and high-dimensional) and with ordinal response variable. This technique has recently been introduced in [30] and it fits perfectly with our need to construct a non-parametric classifier for ordered HB grades from lists of health scores.
The basic idea of ordinal forests is to replace the ordinal response variable Y by a latent continuous variable Y * so that Y depends on (and is uniquely determined by) Y * via a nondecreasing step function. In our setting, this means that individual HB grades correspond to adjacent intervals of Y * . The values of Y * (and its intervals) are optimised with the aim of maximising the prediction performance on the training set. We then grow a regression forest with Y * as a continuous response variable.
For a new observation, we start with prediction of Y * from the regression forest and we then determine the corresponding Y (HB grade in our case).
For our calculations related to ordinal forests, we utilised the software R package; namely, the ordfor function (implemented within the ordinalForest package) with default values of hyperparameters.

Neural Networks
Two deep neural network (DNN) structures were trained on the given data. The core of the first DNN structure, which is further referred to as GRU DNN, are three recurrent layers with Gated Recurrent Unit [31] cell architecture (for full pipeline see Figure 3). Given the nature of the data, the main strength of GRU DNNs is the ability to infer temporal patterns and dependencies in the time-series, if any are present. The second DNN structure, which is further referred to as CNN DNN, utilises three two-dimensional convolutional layers (for full pipeline see Figure 4). Convolutional layers have a strong history in feature extraction and pattern recognition [32,33], both of which are expected of the data. They also allow for an effective dimensionality reduction and, thus, computational efficiency.  Both DNN structures have been trained and evaluated on the same data as the statistical methods. The parameters were optimised by minimising cross-entropy loss function using the Adam optimiser. Accuracy and F1-score, tracking both precision and recall, were used to measure performance on the validation data. For more information about performance metrics, we refer the reader to [34].
To account for imbalance in class distribution of the training data, the loss function was scaled by class weights, which were calculated based on the ratio of samples in each class.
To counteract overfitting, which was expected given the small size of the dataset, a static dropout rate of 0.5 was applied after the feed-forward layer directly following the core structure (see Figures 3 and 4). The training process was also stopped early if the average value of area under curve (AUC) on the validation data did not improve for 20 subsequent epochs and only the best performing parameters were saved.
The cross-validation results are categorised by the DNN structure type (GRU or CNN) and an indicative number of trainable parameters of the whole structure.

Results
The overall classification results of parametric OLR model and non-parametric random forest (RF) model are summarised in Table 2. The neural network classifications for selected structures are shown in Tables 3 and 4.   Given that our goal is not to precisely predict clinician's HB scores because they may be subjective, we consider both the correct classification of HB classes (model HB equals to clinician's HB) and the approximate classification (we tolerate the model HB to differ by at most 1 from the clinician's HB class). The accuracy (or recall) of individual HB classes and the overall accuracy are depicted in Table 5. By accuracy of an HB class, we denote the fraction of correctly classified samples by the model out of all samples in that HB class (assigned by a clinician). The overall accuracy is the fraction of correctly classified samples in all classes out of all samples.
To complete the evaluation of the multi-class classification, Table 6 shows precision calculated for all HB classes individually. The precision of an HB class represents the probability that a sample given the HB class by a model is assigned the same class by a clinician.  The parametric OLR model achieves an acceptable accuracy of exact classification and a sufficient accuracy of approximate classification, displaying only 20% of misclassified samples (predicted HB class differs by more than 1 from the clinician's evaluation). The rates indicate balanced accuracy for all four HB classes, not preferring the dominant HB1 class. Meanwhile, the correct accuracy of the individual HB classes is not consistent in the non-parametric model random forests because the model mainly predicts the dominant HB1 class. There is also much higher number of misclassified cases (31%).
The accuracy of classification by neural networks differs substantially based on the architecture and number of trainable parameters. The GRU architecture provides sub-optimal results for both low and high number of parameters (see Table 5) with as much as 33% and 39% misclassified cases, respectively. In contrast, CNN based architectures achieved accuracies that are comparable to the parametric OLR model and benefit from higher amount of trainable parameters. The small (160 thousand parameters) CNN misclassified in 27% of test cases, while the large (4 million parameters) in only 21%. It is important to note that the number of trainable parameters of the presented neural network models is substantially higher when compared to the size of the training set, which makes them highly susceptible to overfitting.
A comparison of the overall accuracy on the testing set with accuracy on the training datasets (for each step of cross-validation procedure) reveals that the OLR model provides stable results and does not suffer from overfitting, see Table 7. Meanwhile, the nonparametric model random forests displays overfitting: it achieves worse accuracy for testing set compared to accuracy on training sets. Non-parametric methods usually require (by their nature) rich training dataset to learn the systematic patterns and filter out nonsystematic (random) noise. Otherwise, due to their flexibility, they tend to adapt to non-systematic random effects. As expected, neural network models exhibit strong overfitting to the training set for similar reasons as the non-parametric random forest model. However, due to early stopping and dropout regularisation, they can still achieve testing set accuracy that is on par with the parametric OLR model.
The main advantage of the OLR parametric model over the neural network approach lies in the possibility to interpret intermediate steps of the classification process. This provides clinicians with more information on the recovery process, it also enables them to control and possibly adjust the automated evaluation procedure; see [13].

Discussion
The following figures show the resulting comparison of these approaches. Figure 5 represents a "theoretical" exact classification (perfect fit required), which is not so crucial for clinicians. Figure 6 shows the results of the so-called approximate classification (classifier differs from the clinician by at most 1 HB grade). This classification is most widely used in the clinical setting.
The overall comparison of the methods is then plotted in Table 5. It is evident from the overall accuracy row of the approximate classification that the OLR classification method gives the best results of all the selected approaches on these data.
In comparison of the models by HB grades, we considered two approaches: accuracy (Table 5) and precision ( Table 6). The former is calculated from the row-wise distributions in a confusion matrix and it relates to most frequent approach (probability of correct classification given the true HB). The latter is calculated from the column-wise distributions in a confusion matrix and relates to the Bayesian approach (probability of correct classification given the predicted HB).

Conclusions
We developed a tool for statistical analysis of mimetic movements of patients after head and neck surgery. This tool is intended for the objective assessment of the patient's rehabilitation process, which has so far only been assessed subjectively by a clinician.
We compared a number of approaches: • Parametric statistics (based on OLR) • Non-parametric statistics (based on ordinal random forests) • GRU • Convolutional neural network The parametric statistical approach based on the combination of functional logistic regression and ordinal logistic regression provided stable and reasonable results for both accuracy and precision. This demonstrates its potential to become a widely acceptable and useful method. The pros and cons of individual algorithms are summarised in Table 8. Various approaches were used for data analysis, starting with standard continuous statistical data analysis and ending with convolutional neural networks. The reliability of the individual approaches was tested and compared. The resulting application is currently successfully used in the rehabilitation of patients with so-called vestibular swannoma at the Clinic of Otorinolaryngology, University hospital Královské Vinohrady, Prague.

Future Work
Future plans in the framework of this research focus mainly on the following • Better data acquisition: As part of data collection and pre-processing, it is expected that the operator will be immediately informed about the quality and usability of the record; • Unsupervised learning: In the next step, the data will be processed independently of clinical practice, which is heavily burdened by the subjective opinion of the physician; • Fusion with the project devoted to gait analysis: The data from these experiments are fused with data from the analysis of gait of the same patients (these patients often suffer from balance problems in addition to mimic problems); • Mobile application: In addition to a full-fledged application in a hospital environment, it is also planned to create a mobile application (using the camera system of a smartphone), which will be available to patients for home rehabilitation.