Writhing Movement Detection in Newborns on the Second and Third Day of Life Using Pose-Based Feature Machine Learning Classification

Observation of neuromotor development at an early stage of an infant’s life allows for early diagnosis of deficits and the beginning of the therapeutic process. General movement assessment is a method of spontaneous movement observation, which is the foundation for contemporary attempts at objectification and computer-aided diagnosis based on video recordings’ analysis. The present study attempts to automatically detect writhing movements, one of the normal general movement categories presented by newborns in the first weeks of life. A set of 31 recordings of newborns on the second and third day of life was divided by five experts into videos containing writhing movements (with occurrence time) and poor repertoire, characterized by a lower quality of movement in relation to the norm. Novel, objective pose-based features describing the scope, nature, and location of each limb’s movement are proposed. Three machine learning algorithms are evaluated in writhing movements’ detection in leave-one-out cross-validation for different feature extraction time windows and overlapping time. The experimental results make it possible to indicate the optimal parameters for which 80% accuracy was achieved. Based on automatically detected writhing movement percent in the video, infant movements are classified as writhing movements or poor repertoire with an area under the ROC (receiver operating characteristics) curve of 0.83.


Introduction
Neuromotor development is a process aimed at achieving full mobility and independence by an infant. It runs in a planned order, so that in healthy developing infants, the sequence of movements is the same and repetitive. At the same time, it has a cascade character, consisting of the fact that after reaching one motor activity, another one follows, while the dynamics of development is not uniform and is based on the so-called development leaps. However, child development is a continuous and coherent process in which the skills achieved in individual development spheres are correlated [1]. Many factors can disrupt this process. It is therefore important that early and comprehensive diagnostics detect them as early as possible to receive appropriate treatment and therapy [2]. from birth to the age of one year. The measure of objectivity is the comparability of the results obtained by different evaluators [12]. In this respect, the scales rely on the therapist's perception, which is limited for objective reasons. Further development of research tools must consequently involve the widest possible use of innovative technological solutions, including IT tools, which can significantly support the diagnosis in terms of the quantity of data and the possibility of calculations, measurements, and forecasts [13].
Computer-aided diagnosis of newborns is an area that has been intensively developed in recent years [14]. The starting point for experiments is most often the diagnostics based on the GMA as it does not require a direct interaction between the person conducting the examination and the child. Information about spontaneous motor activity of a child has been collected by researchers in various ways, e.g., using wearable sensors [15] such as accelerometric [16,17], IMU [18,19], and electromagnetic sensors [20]. Although the wearable sensors allow for obtaining good results in the analysis of adults movement [21], the question of the effect of body-mounted elements and their wiring on the child's spontaneous movements remains debatable. Due to the increasing capabilities of computer image processing, various solutions based on video analysis [22] and using optical systems for depth image acquisition (RGB-D) [23] are currently being proposed. Compared to sensor data, the use of video-based data reduces the impact of the measurement on the child, offers more opportunities for practical use, and ensures better results [24]. In individual studies, authors have attempted to identify irregularities, mostly by reducing the problem to the classification of the recording into the normal and abnormal groups. The proposed solutions can be divided according to the method of the extraction of features describing the child's movement into those based on features extracted directly from the recording (optical flow, background subtraction) [25,26] and pose-based features, in which the extraction of features is preceded by the process of locating individual body segments [27,28]. Currently, the extraction capabilities of pose-based features have improved due to the availability of ready-made human pose estimation libraries such as OpenPose [29], whose accuracy in the case of infant movement analysis has been confirmed in various studies [28,30].
In the present study, the approach based on video recordings of newborns is shown. These data were the necessary basis for the diagnostics using the GMA. Due to the very low age of the examined newborns, the data acquisition process had to be performed in the neonatal unit. The measurement stand has been prepared in such a way that it is possible to easily carry out the recording procedure by the hospital staff. The use of additional wearable sensors would extend the length of additional procedures for installing sensors on the child's limbs and running data acquisition software. It would also be necessary to monitor the condition of the sensors and ensure that they are always ready to work, e.g., by keeping the battery charge high. The use of active sensors in this case could also raise parents' doubts about the infant's safety, which would make it more difficult to obtain enough data.

Aim of the Study
The aim of the present study is to use machine learning methods to detect writhing movements on video recordings of children on the second and third day after birth. The authors attempted to determine the indices of spontaneous motor activity of a child that allow for its quantitative description. The proposed parameters must be normalized in such a way that their values do not depend on the parameters of the camera, the child's body dimensions, and its position relative to the camera. Another criterion that must be met is the possibility of the easy interpretation and reference to the observation made by the diagnostician. Based on the parameter values obtained for different time intervals, we attempted to use popular machine learning methods for the detection of writhing movements.

Materials and Methods
The study was approved by the Biomedical Research Ethics Committee (No. 5/2018) and in accordance with the Declaration of Helsinki. The study was approved by the Bioethics Committee of Research of the Jerzy Kukuczka Academy of Physical Education in Katowice.

Dataset
The study used a database of 125 recordings of newborns collected in cooperation with the Neonatal Unit at Public Hospital in PiekaryŚląskie, Poland. The measurement stand consisted of a 1 m × 1 m tabletop and a frame with a camera mounted 1 m above the tabletop surface. The stand was equipped with a HDR-AS200V video camera (Sony Corporation, Tokyo, Japan), enabling recording at a spatial resolution of 1920 × 1080 px at a 60 fps sampling rate.
The study involved children on the 2nd or 3rd day after birth. The inclusion criteria in the experiment were the uneventful course of pregnancy and the children who had been born full-term (38-42 weeks) with an Apgar score of 8-10. After obtaining the parent's written consent to participate in the study, the hospital staff started the recording procedure. The child in the hospital bed was placed on the tabletop of the stand, and then, the camera was activated. During the examination, the child was under the supervision of the researcher. If a child cried, the recording was interrupted and, if possible, repeated later. The recommended time for correct recording was at least 10 min.
The collected recordings were evaluated by an expert for compliance with the active wakefulness states 3 (eyes open, no movements) and 4 (eyes open, gross movements), according to Prechtl's classification of states [31]. The videos that showed longer fragments of crying or sleeping of a child were removed from further analysis. In some cases, the caregiver's intervention to calm the child by rocking, stroking, feeding, or giving a pacifier was observed during the recording, which also excluded the recording from further analysis. Meeting the recommended criteria for effective GMA proved to be extremely difficult in the test conditions. Of all 125 recordings, thirty-six were included in further analysis. These recordings were then evaluated by 5 experts in the field of diagnostics using the GMA and divided into groups with normal WM and PR movements. In 26 cases, the experts were in full agreement on the decisions made (WM: 14, PR: 12). In the case of divergences in classification, the final decision was made jointly. Eventually, the dataset was divided into 17 recordings containing normal WM and 14 containing PR movements. Differences between the experts in the other 5 cases determined the exclusion of these recordings from the classification stage ( Figure 1). In the recordings from the normal WM group, each expert independently identified the parts in which the child presented the writhing movement. Due to the discrepancies between the experts, training data preparation required the determination of one set of WM fragments for each recording, in the form of a common part for N experts ( Figure 2). Depending on the number of experts for whom the common part was selected, the number and length of the WM fragments changed. The WM time in the training set was longer for low N values. Increasing the number of experts shortened it, while increasing the confidence of the classification. Selection of the appropriate N parameter was one of the evaluation elements of the proposed models.

Video and Pose Preprocessing
The study used the pre-processing methodology proposed by the authors in the previous pilot study on computer-aided infant movement analysis. In order to reduce the impact of the camera's optical system, the recordings were subjected to a procedure of distortion removal. The camera parameters were determined by calibration using the checkerboard pattern. Then, from the whole recording, the area containing the hospital bed with a newborn was manually indicated [13].
Pose estimation was performed using the OpenPose library [29]. The processing yielded 25 landmarks indicating the location of individual joints and characteristic points. Results with low detection confidence were replaced by previous point locations. The observation of the detected points revealed the occurrence of vibrations in the form of high-frequency noise resulting from the inaccuracy of the detection of a given point on subsequent frames of the video. Removal of these disturbances without affecting the overall movement trajectory was performed using the Savitzky-Golay filter, which enables smoothing the signal without significant distortion [32]. In order to reduce the impact of the arrangement and the changes in the body position on the results, the coordinates of the points from each frame were translated so that the neck point did not change its position during the recording. Then, the entire body was rotated in such a way that the axis of the child's body was parallel to the axis of the image. A similar approach for the pre-processing of the OpenPose results in the analysis of infant recordings was previously proposed by various research teams [13,28,30].

Feature Extraction
In previous studies, we verified the possibilities of describing the scope, nature, and location of limb movements of children in the period of fidgety movements using the parameters of the ellipse circumscribed on the movement trajectory [13]. A methodology for normalization of the proposed parameters based on limb length was also developed, making it possible to use them for the objective description of limb movement. This allowed for the observation of their variability depending on the head position during a few minutes of observation [33]. In the present experiment, the previously proposed features of the FMA (factor of movement's area), FMS (factor of movement's shape), and CMA (center of movement's area) were determined using shorter analysis time: from 5 to 30 s, and different times of overlapping of the subsequent analysis frames.
Using the trajectories of the distal parts of the limbs, a set of features was determined based on the parameters of the ellipse circumscribed on each trajectory. In the first stage of processing, the effect of single deviations in the trajectory was reduced. For all locations of a point vs. time, its Euclidean distance from the trajectory centroid determined as a mean value from each coordinate was calculated. Then, based on the interquartile range rule, points with a distance greater than Q 3 + 1.5IQR were deleted. The next step was downsampling used to reduce the accumulation of trajectory points at the moments of limb stops. The ellipse parameters were determined based on the eigenvectors and eigenvalues of the covariance matrix of other trajectory coordinates. The pre-processing and its effect on the resulting shape and position of the ellipse are shown in Figure 3. The values of the determined features expressed in the domain of the analyzed image depend on the camera parameters and the length of the child's body and limbs. The individual features were normalized to obtain an interpretable value in relation to the actual movement parameters, regardless of the factors indicated. The independence from the proportions of the child's body also allows increasing the reliability of the comparison of the values obtained between the individual children. The procedure for measuring the length of a child's limbs in the domain of the image was performed automatically based on the location of the joints. For each frame, the Euclidean distance was determined between three consecutive points: shoulder, elbow, and wrist for upper limbs and hip, knee, and ankle for lower limbs. Next, the maximum value was selected, corresponding to the situation where the limb was placed parallel to the image plane.
FMA was obtained by dividing the area of the ellipse expressed in pixels by the area of the circle with a radius corresponding to the limb length. Its value can be interpreted as the ratio of the motion performed to the maximum possible range of motion. The FMS parameter corresponds to the aspect ratio of the ellipse determined by dividing the length of the minor axis by the length of the major axis. The position of the center of the ellipse was related to the coordinate system associated with the proximal joint of the analyzed limb: shoulder joint for the upper limb and hip joint for the lower limb. The unit of the system is determined by the length of the limb; the sense of the vertical axis is always oriented towards the head, whereas the sense of the horizontal axis depends on the analyzed side and points to the distal direction (from the body axis) (Figure 4). The FMA, FMS, and horizontal and vertical components of CMA give 4 values describing the movement of each limb. In the classification process, these 4 values for 4 limbs were concatenated into a vector with 16 features characterizing spontaneous movement of the infant.

Classification
The problem of detecting the instants in the recording when the newborn performed writhing movements is considered a binary classification problem. Training data were prepared based on the time intervals of writhing movements defined by 5 experts in the individual videos. Feature extraction was performed for different lengths of the analysis window. The resulting feature vector containing FMA, FMS, and CMA for each limb contained 16 values of features. Each analyzed window was marked as WM (writhing movement) or OM (other movement) depending on the proportion of frames marked as WM by experts.
A leave-one-out cross-validation was performed to exclude one recording from the training set. Then, the training data from the other 30 videos were used to train the machine learning algorithm. The result of the model for the test case allowed for the determination of the evaluation metrics such as accuracy, sensitivity, specificity, and F1 score [34]. Three machine learning algorithms were selected for classification: support vector machine (SVM) with the radial basis function (RBF) kernel, random forests (RF), and a classifier based on linear discriminant analysis (LDA). The testing procedure was conducted for different time frames and overlapping times. In overlapping moments, the final classification was based on the more frequently indicated class, and in the case of equal division, the fragment was classified as other movements.

Results
The first step of the analysis of fragments containing WM was to compare experts' agreement on the time of their occurrence on recordings. For this purpose, for each pair of results, the accuracy and F1 score parameters were calculated from all 17 recordings containing WM (Table 1). The mean accuracy between the experts was 81.89, with an average F1 score of 79.44. For the whole set of 31 WM and PR recordings, feature extraction was performed for 5, 15, and 30 s non-overlapping time windows. Next, leave-one-out cross-validation was carried out five times, changing the N parameter affecting the training dataset. Based on the results, the sensitivity and specificity were determined for a given N in relation to the ground truth. The results showing the relationship between the true positive rate (TPR) and false positive rate (FPR) for individual classifiers are presented in Figure 5. Subsequent analyses were extended by overlapping time, which was changed to have 1 /2 or 2 /3 of the window length. Figure 6 presents the results of individual classifiers for a time window of 15 s with overlapping of 0, 7.5, and 10 s, respectively. The increase in N for LDA and RF led to an increase in sensitivity while reducing specificity. For SVM, the increase in N did not negatively affect specificity, allowing reaching high sensitivity values. The use of the overlapping time of 10 s and N = 3 improved the sensitivity of each classifier, with similar or higher specificity values. The exact values of sensitivity and specificity and the corresponding accuracy and F1 score are presented in Table 2.   A comparison of the classification results with the division indicated by each expert was performed by calculating the accuracy and F1 score between the results for the entire set of videos containing WM ( Table 3). These results can be compared with Table 1 using the classifier as the next expert. The last column shows the average value of each parameter. Accurate detection of normal writhing movements and their differentiation from poor repertoire movements should allow classifying recordings according to the division made by experts. The criterion for group division was the percentage of detected WM movements in a given recording. The receiver operating characteristics (ROC) curve was obtained by increasing the threshold value and determining the sensitivity and specificity for each classifier (Figure 7). The resulting area under the ROC curve (AUC) values are shown in Table 4.

Discussion
This study attempted to automatically detect normal writhing movements based on the proposed set of original features using pose estimation of video recordings. In the study, the experts analyzed the recordings of 125 healthy newborns on the second and third day of life in order to select a group meeting the criteria for inclusion in the classifier preparation stage. Eventually, at least 10 min recordings of active waking were obtained in the group of 36 newborn babies included in the study. The restrictive criteria adopted at this stage of the examinations were aimed at obtaining a reliable set of data allowing for the proper validation of the proposed automated detection methodology. Fragments from rejected recordings may be included in future research to extend the methodology proposed in the study.
The classification of recordings into normal writhing movements and poor repertoire movements by five experts suggests the high reliability of the division obtained. This is particularly important due to the age of newborns in the study group. The application of general movement assessment in the first week of life is a big challenge and is characterized by a low diagnostic value in the context of the future prevalence of abnormalities during neurological development [5]. A large percentage of newborns develop normal writhing movements after about seven days of life, previously showing poor repertoire movements [9]. This is confirmed by the division into recordings containing WM and PR in the analyzed group of healthy newborns.
Several tests were performed, involving leave-one-out cross-validation of three classifiers: SVM, RF, and LDA, using the proposed numerical indicators (FMA, FMS, and CMA) describing the range, nature, and location of the movements of individual limbs. The change of the window length and overlapping time and the training set by modifying the N parameter (Figure 4) influenced the obtained values of sensitivity and specificity (Figures 5 and 6). Based on the analyses, an optimal set of parameters was selected: window length of 15 s with an overlapping time of 10 s and N = 3 due to a good ratio of sensitivity and specificity for the SVM classifier (Table 2). For the proposed features, low sensitivity values were obtained using RF and LDA. The analysis of individual cases showed that the decrease in sensitivity is mainly due to problems with the detection of the entire WM fragments in specific recordings by all three classifiers. The SVM is better at detecting longer and consistent fragments of WM, whereas RF and LDA find a few smaller fragments, while omitting a greater number of whole WM fragments. The indication N = 3 confirms the thesis that more than one expert should be consulted. The increase in N between one and three in each case resulted in a significant increase in sensitivity and, in the case of SVM, also an increase in specificity. The indication of the exact beginning and end of the movement was a problematic task for the experts because in practice, this information is not important for making an accurate diagnosis. The choice of a common fragment for a greater number of experts eliminated the effect of uncertain redundant fragments on the learning process. The use of overlap in WM detection is justified by the way physiotherapists work, since in practice, when they make a decision concerning a given instant of a recording, they take into account the state before and after the fragment.
The comparison of WM indications as an effect of individual classifiers was compared with the indications of individual experts taking into account only recordings from the group containing WM ( Table 3). The highest values of the accuracy and F1 score were obtained for the SVM. These values are lower than the parameters comparing indications between experts (Table 1). This comparison allows for verification of the capability of the classifier to distinguish normal writhing movements from other movements (or rest) that are not poor repertoire movements. The lower values are mainly due to the SVM's failure to indicate long WM fragments in several recordings.
The use of the proposed methodology to classify entire recordings into those containing normal WM and those containing poor repertoire was verified by drawing the ROC for a variable percentage threshold of WM proportion in the recording (Figure 7). This aimed to check the possibility of automated WM classification in the screening indication of the recordings containing poor repertoire movements (positive class). AUC values above 0.8 were obtained for all three classifiers. The parameter was affected in this case by the high specificity of each classifier. In the best case for the SVM classifier, twenty-six recordings were classified correctly, with three false positives and two false negatives. The AUC value of 0.83 for this classifier can be compared with the classification made by experts. In five of 36 cases, the decision on proper classification was not obtained. The remaining resolved 31 expert classifications gave 86% of certain decisions in the division of recordings into those containing WM and those containing PR.
To the best of the authors' knowledge, this study is the first attempt to indicate fragments of WM on recordings of newborns in the first days of life. In the cases considered so far, the prevalence of fidgety movements in newborns from the age of two months has been considered more frequently, especially in the context of early detection of the risk of cerebral palsy [35]. An approach similar to that presented in this study was used by Ihnen et al. [27], who analyzed cerebral palsy prediction in a large group of recordings of children over nine weeks of age. Their training set was built based on the classification of the entire recordings, and then each 5 s of the recording were classified. The division of recordings was made based on the diagnosis of CP in a child in the next period of life, into CP and non-CP classes. The proposed model was built based on features determined using the trajectory of selected body segments for every five seconds of recording. The classification of the recording was based on the percentage of CP fragments, reaching AUC = 0.87. In a study by McCay et al. [28], an attempt was made to automatically divide short recordings from the MINI-RGBD database [23] into normal and abnormal movements based on pose-based features. The proposed histogram-based features ensured the high accuracy of the automated grouping of 12 recordings using SVM, LDA, and different neural network architectures. The results indicated high accuracy of the classification of spontaneous movements in newborns based on features describing only the movement of limbs using neural networks. The problem of the automated detection of normal writhing movements and poor repertoire movements was addressed in a study by Tsuji et al., which discussed the classification of 30 s fragments of recordings into four types of general movements [26]. Using a log-linearized Gaussian mixture network and features based on the motion image, the authors achieved high accuracy to distinguish between normal and abnormal movements. However, only previously selected fragments of video recordings containing general movements were used in the study, without periods of crying, sleep, or lack of movement, which, compared to the analysis of the whole recording, could have contributed to an increase in accuracy.
The latest publications in the field of computer-aided diagnostics of infants indicate the direction of further development towards solutions based on human pose estimation algorithms [26,27]. The results of the research using this approach show the great potential of the OpenPose library for this type of application [28,30,36].
This study proposed intuitive spatio-temporal pose-based features describing spontaneous child motor activity and confirmed the possibility of their use in the detection of writhing movements in recordings of children on the second and third day of life. Due to the lack of newborns with confirmed neurodevelopmental abnormalities in the analyzed group, the effectiveness of the proposed method in diagnostics support must be confirmed by subsequent studies. Another limitation of the present study is the size of the group, which should be increased, in particular, with high-risk newborns. The proposed features describe infant's motor activity in only two dimensions, which results in a loss of information that may be important for a quantitative description. In further stages of the development of the presented solution, the authors plan to extend the set of features with parameters describing the change in the position of the body trunk and head. During experts' observation, this information plays an important role in decision making, so it may positively affect the automatic classification results. It is also planned to extend the range of classification algorithms tested, in particular with different neural network architectures and deep learning approaches. Further research also assumes the use of useful fragments of recordings rejected at this stage. It is also necessary to develop a methodology for the initial indication of moments of movement without crying of the child, which, combined with a greater diversity of data sets, should yield better results.