Automatic, Qualitative Scoring of the Clock Drawing Test (CDT) Based on U-Net, CNN and Mobile Sensor Data

The Clock Drawing Test (CDT) is a rapid, inexpensive, and popular screening tool for cognitive functions. In spite of its qualitative capabilities in diagnosis of neurological diseases, the assessment of the CDT has depended on quantitative methods as well as manual paper based methods. Furthermore, due to the impact of the advancement of mobile smart devices imbedding several sensors and deep learning algorithms, the necessity of a standardized, qualitative, and automatic scoring system for CDT has been increased. This study presents a mobile phone application, mCDT, for the CDT and suggests a novel, automatic and qualitative scoring method using mobile sensor data and deep learning algorithms: CNN, a convolutional network, U-Net, a convolutional network for biomedical image segmentation, and the MNIST (Modified National Institute of Standards and Technology) database. To obtain DeepC, a trained model for segmenting a contour image from a hand drawn clock image, U-Net was trained with 159 CDT hand-drawn images at 128 × 128 resolution, obtained via mCDT. To construct DeepH, a trained model for segmenting the hands in a clock image, U-Net was trained with the same 159 CDT 128 × 128 resolution images. For obtaining DeepN, a trained model for classifying the digit images from a hand drawn clock image, CNN was trained with the MNIST database. Using DeepC, DeepH and DeepN with the sensor data, parameters of contour (0–3 points), numbers (0–4 points), hands (0–5 points), and the center (0–1 points) were scored for a total of 13 points. From 219 subjects, performance testing was completed with images and sensor data obtained via mCDT. For an objective performance analysis, all the images were scored and crosschecked by two clinical experts in CDT scaling. Performance test analysis derived a sensitivity, specificity, accuracy and precision for the contour parameter of 89.33, 92.68, 89.95 and 98.15%, for the hands parameter of 80.21, 95.93, 89.04 and 93.90%, for the numbers parameter of 83.87, 95.31, 87.21 and 97.74%, and for the center parameter of 98.42, 86.21, 96.80 and 97.91%, respectively. From these results, the mCDT application and its scoring system provide utility in differentiating dementia disease subtypes, being valuable in clinical practice and for studies in the field.


Introduction
As a sub-test of the Mini-Mental State Examination (MMSE), the Clock Drawing test (CDT) along with the Pentagon Drawing test (PDT) and the Rey-Osterrieth Complex Figure test (ROCF) have been widely used in neuropsychology and related areas for neurological and cognitive exams for dementias such as Alzheimer's and others [1][2][3][4][5][6][7]. For the CDT, a qualitative neurological drawing test commonly used as a screening instrument for cognitive capabilities, a subject is asked to draw a clock showing a specific time. Placement of the numbers around the clock contour requires visual-spatial, numerical sequencing and planning abilities [8]. Drawing hands indicating a specific time requires long-term attention, memory, auditory processing, motor programming, and frustration tolerance [8]. Hence, in CDT, the test subject uses various cortical areas at the same time for the task being performed; these include use of the frontal, parietal, and temporal lobes [9,10]. As such, various cognitive skills such as selective, sustained attention, visuospatial skills, age data were subjected to testing the application's scoring method. Two neurologists assisted in scoring of the CDT as the experts in CDT scaling, and cross checked all the images for an objective performance analysis, as well as gathering the data of the CDT test from the 140 PD patients. The Institutional Review Board of the Hallym University Sacred Heart Hospital, as an independent ethics committee (IEC), ethical review board (ERB), or research ethics board (REB), approved the gathering data and the protocols used for this study (IRB number: 2019-03-001). Table 1 provides the age, gender, and binary CDT score summary of the 238 volunteers and 140 PD patients.

Implementation of the Deep Learning Based Mobile Clock Drawing Test, mCDT
The Android Studio development environment was used to develop the deep learning based mobile application mCDT for the clock drawing test. A user of mCDT draws the face of a clock with all the numbers present and sets the hands to a specific time such as 10 after 11. Here, the clock face contour could be pre-drawn by mCDT as an option chosen by the user and the specific time is randomly selected by mCDT. Then, mCDT scores the drawn image qualitatively; this scoring is based on mobile sensor data of the drawing image and pre-trained models, DeepC, DeepH and DeepN created in this study. Fast and precise segmenting of the clock face contour and the hands in the images were accomplished by DeepC and DeepH, respectively, using U-Net, a convolutional network architecture. In turn, DeepN classifies the numbers using CNN and the MNIST database. The mobile sensor data of x and y coordinates in pixels, timestamps in seconds, and touch events for each samples of the drawing image are made with a 50 Hz sampling frequency. Three types of the touch events, 'up', 'down', and 'move' were considered in mCDT. The touch event 'down' occurs when the user starts to touch on the screen; 'up' when the user ends it; and 'move' when the user moves the finger or the pen across the screen. Figure 1a provides the flow chart of the processes by mCDT. Figure 1b-d provide the screen shots of the registration window, the CDT window, and the result window of mCDT, respectively. As shown in Figure 1a,b, an informed consent prompt appears at the launch of mCDT, followed by a registration window for entering the subject's information; these include age, name, gender, education level and handedness of the subject plus optional parameters including an email address. After pressing the start button in the registration window, the CDT window appears as shown in Figure 1c, and the user is instructed to draw numbers and hands on a clock face contour; the contour of the clock face is asked to be drawn by the user or pre-drawn by mCDT as an option chosen by the user. In the drawing, the clock hands have to set to a specific time given randomly by mCDT. The sensor data are saved as the subject draws a contour, numbers and hands of a clock face on the touch screen of the CDT window. The sensor data along with the drawn image are then provided in the results window as shown in Figure 1d. The results could then be forwarded to the email address submitted at the registration window.

Pre-Trained Models, DeepC, DeepH and DeepN Based on the U-Net and the CNN
Novel pre-trained models DeepC, DeepH and DeepN were developed for the segmentation and classification of the contour, the hands, and the numbers, respectively, of

Pre-Trained Models, DeepC, DeepH and DeepN Based on the U-Net and the CNN
Novel pre-trained models DeepC, DeepH and DeepN were developed for the segmentation and classification of the contour, the hands, and the numbers, respectively, of the clock face from a drawn clock image. DeepC and DeepH were created based on the U-Net convolutional network architecture and DeepN, based on the CNN in keras [6]. The U-Net and CNN network architecture implemented in this study are illustrated in Figures 2 and 3, respectively. The U-Net network architecture consists of a contracting path, an expansive path, and a final layer as shown in Figure 2. The contracting path consists of repeated applications of two 3 × 3 convolutions and a 2 × 2 max pooling operation with stride 2 for down-sampling. At each repetition, the number of feature channels is doubled. The expansive path consists of two 3 × 3 convolutions and a 2 × 2 convolution ("up-convolution") for up-sampling to recover the size of the segmentation map. At the final layer, a 1 × 1 convolution was used to map each 16-component feature vector to the desired number of classes. In total, the network has 23 convolutional layers. The training data for both DeepC and DeepH contain 477 images of 128 × 128 resolution, which were augmented using a module called ImageDataGenerator in keras.preprocessing.image and resized from the original 159 images of 2400 × 1200 resolution. The augmentation was carried out by randomly translating horizontally or vertically using the parameter value 0.2 for both width_shifting_range and height_shifting_range. DeepC and DeepH were both trained for 100 epochs with an accuracy of about 77.47% and 79.56%, respectively. The loss function used for the training was basically a binary cross entropy. The CNN network architecture consists of two convolution layers (C1 and C3), two pooling layers (D2 and D4), and two fully connected layers (F5 and F6), as shown in Figure 3. The first convolution layer C1 filters the 28 × 28 input number image with 32 kernels of size 5 × 5, while the second convolution layer C3 filters the down-sampled 12 × 12 × 32 feature maps with 64 kernels of size 5 × 5 × 32. A unit stride is used in both the convolution layers, and a ReLU nonlinear function is used at the output of each of them. Down-sampling occurs at layer D2 and D4 by applying 2 × 2 non-overlapping max pooling. Finally, the two fully-connected layers, F5 and F6, have 1024 and 10 neurons, respectively. The MNIST handwritten digit database (about 60,000 images) and the digit images from the 477 CDT images were used to train the CNN architecture used here to obtain the trained model, DeepN.
Sensors 2021, 21, x FOR PEER REVIEW 6 of 33 the clock face from a drawn clock image. DeepC and DeepH were created based on the U-Net convolutional network architecture and DeepN, based on the CNN in keras [6]. The U-Net and CNN network architecture implemented in this study are illustrated in Figures  2 and 3, respectively. The U-Net network architecture consists of a contracting path, an expansive path, and a final layer as shown in Figure 2. The contracting path consists of repeated applications of two 3 × 3 convolutions and a 2 × 2 max pooling operation with stride 2 for down-sampling. At each repetition, the number of feature channels is doubled. The expansive path consists of two 3 × 3 convolutions and a 2 × 2 convolution ("up-convolution") for up-sampling to recover the size of the segmentation map. At the final layer, a 1 × 1 convolution was used to map each 16-component feature vector to the desired number of classes. In total, the network has 23 convolutional layers. The training data for both DeepC and DeepH contain 477 images of 128 × 128 resolution, which were augmented using a module called ImageDataGenerator in keras.preprocessing.image and resized from the original 159 images of 2400 × 1200 resolution. The augmentation was carried out by randomly translating horizontally or vertically using the parameter value 0.2 for both width_shifting_range and height_shifting_range. DeepC and DeepH were both trained for 100 epochs with an accuracy of about 77.47% and 79.56%, respectively. The loss function used for the training was basically a binary cross entropy. The CNN network architecture consists of two convolution layers (C1 and C3), two pooling layers (D2 and D4), and two fully connected layers (F5 and F6), as shown in Figure 3. The first convolution layer C1 filters the 28 × 28 input number image with 32 kernels of size 5 × 5, while the second convolution layer C3 filters the down-sampled 12 × 12 × 32 feature maps with 64 kernels of size 5 × 5 × 32. A unit stride is used in both the convolution layers, and a ReLU nonlinear function is used at the output of each of them. Down-sampling occurs at layer D2 and D4 by applying 2 × 2 non-overlapping max pooling. Finally, the two fullyconnected layers, F5 and F6, have 1024 and 10 neurons, respectively. The MNIST handwritten digit database (about 60,000 images) and the digit images from the 477 CDT images were used to train the CNN architecture used here to obtain the trained model, DeepN.

Scoring Method of mCDT
The novel, automatic and qualitative scoring method of mCDT was developed, based on the sensor data and the pre-trained models, DeepC, DeepH and DeepN. Four param eters were included in the scoring method: contour (0-3 points), numbers (0-4 points) hands (0-5 points), and the center (0-1 point). Some of the scoring criteria were adopted from a previous study by Paolo Caffarra et al. [22]. A total score corresponding to the sum of individual scores of each parameter ranged from 0 to 13. When a subject executes the CDT more than once, the best copy is then scored. A detailed list of the scoring criteria o each parameter used in this study is presented in Table 2 and the overall flowchart of the scoring method is shown in Figure 4.  . CNN network architecture for this study. It is made up of two convolution layers (C1 and C3), two pooling layers (D2 and D4), and two fully connected layers (F5 and F6). C1, the first convolution layer, filters the 28 × 28 input number image with 32 kernels of 5 × 5 size; C3, the second convolution layer, filters the down-sampled 12 × 12 × 32 feature maps with 64 kernels of 5 × 5 × 32 size. Both of the convolution layers use a unit stride, and at the output of each, a ReLU nonlinear function is used. At layers D2 and D4, down-sampling is performed with 2 × 2 non-overlapping max pooling. For the two final fully-connected layers, F5 and F6, they respectively have 1024 and 10 neurons.

Scoring Method of mCDT
The novel, automatic and qualitative scoring method of mCDT was developed, based on the sensor data and the pre-trained models, DeepC, DeepH and DeepN. Four parameters were included in the scoring method: contour (0-3 points), numbers (0-4 points), hands (0-5 points), and the center (0-1 point). Some of the scoring criteria were adopted from a previous study by Paolo Caffarra et al. [22]. A total score corresponding to the sum of individual scores of each parameter ranged from 0 to 13. When a subject executes the CDT more than once, the best copy is then scored. A detailed list of the scoring criteria of each parameter used in this study is presented in Table 2 and the overall flowchart of the scoring method is shown in Figure 4.    Two forms of data that include the sensor data and the clock drawing image are generated for output once mCDT has been completed, as shown in Figure 4a. The clock drawing image, I C , is intended to be of a clock face with numbers, hands and a contour. From the original 2400 × 1200 pixel drawing image at the CDT window, the clock drawing image I C is resized to 128 × 128 pixels. Time stamps t[n] in sec, x-and y-coordinates, x[n] and y[n] in pixels, and touch-events e[n] of the sensor data for the 128 × 128 drawing image have a sampling rate of 50 Hz with n being the index of a sample point. Each touch-event e[n] has a value such as −1, 0, 1; the assigned value of −1 designates the event 'down' for the screen being touched, 1 is for the event 'up' with the screen touch not continuing, and 0 is for the event 'move' with moving and touching on the screen continuing.
The sensor data x[n] and y[n], c i ≤ n ≤ c f belonging to the contour in the clock drawing image I C are obtained using the touch-events e[n], c i ≤ n ≤ c f , where c i and c f are the start and the end indices of the contour, respectively, which can be estimated by the touch-event down-shifting from the event 'down' into the event 'move' and the touch-event up-shifting from the event 'move' into the event 'up', respectively. Besides, the touch-events e[n], c i < n < c f between the touch-event down and up shiftings have to be successively stayed in the event 'move' for the longest time if such a period occurs more than once. In other words, the longest continuous sequence of 0 s in the touch-event e[n], starting with the digit −1 and ending with the digit 1, identifies itself as belonging to the contour in the clock drawing image I C .
The contour image I f c is segmented from the clock drawing image I C using DeepC, the pre-trained models. Next, percentages p f c of the segmented image I f c , matching to the corresponding portion of the clock drawing image I C , is estimated by Equation (1), where n(I C,c i ≤n≤c f ∩ I f c ) is the number of pixel coordinates that I f c and the contour image I C,c i ≤n≤c f have in common; and n(I C,c i ≤n≤c f ) is the total number of pixel coordinates in the sensor data belonging to the contour image.
A modified clock drawing image I C is generated as shown in Figure 4b by redrawing the remaining part of the clock drawing image I C after excluding the sensor data x[n], y[n], t[n], c i ≤ n ≤ c f belonging to the contour, and then by binarizing for the background to be black and for the number digits to be drawn white.
The hand image I f h is separately segmented from the clock drawing image I C using DeepH, the pre-trained models. Next, percentages p k f h , k = 1, 2 of the segmented images I f h , matching to the corresponding portion of the clock drawing image I C are estimated by Equation (2) The sensor data x[n] and y[n], h k i ≤ n ≤ h k f , k = 1, 2 belonging to one of the hand and minute hands images in the modified clock drawing image I C , are obtained using the touch events e[n], h k i ≤ n ≤ h k f , k = 1, 2, where h k i and h k f are the start and the end indices of the hand, respectively, which can be estimated by a touch-event down shifting from the event 'down' into the event 'move' and by the touch-event up shifting from the event 'move' into the event 'up', respectively. Here x c min = min x c max = max The x-and y-coordinates in pixels, x c mid and y c mid of the center point of the contour are defined by the bisecting point of the minimum and maximum x-and y-coordinates in pixels, x c min and x c max , and y c min and y c max , that are formulated by Equations (7) and (8), respectively:  Table 3 summarizes the corresponding formula of each of the pre-estimated positions P k , k = 1, 2, . . . , 12. Table 3. Formulas of the pre-estimated positions of the number digits from 0 to 12.

Number Digit k
Formula Next, each of the number images I j C , j = 1, 2, . . . , N corresponding to a digit number is cropped out from the modified clock drawing image I C using the function findContours() of OpenCV2, where N is the total number of the digit images cropped out and j is the index sorted by the time stamps in ascending order. Here, the function findContours() can be used for finding the suburb contours of white objects from a black background sh < c f . The closure of the contour is identified if it is greater than a given threshold θ c2 , the maximum value d c max of (p c i , p c f ) , the distance between the first and the last contour sample points, and (p k sh , p k+1 sh ) the distances between the kth and the (k + 1)th sample points shifting down or up in the touch events, where the index k is sorted by the time stamp of the sample points. The appropriateness of the contour size is evaluated by the ratio of the size of the contour to that of the CDT window. The contour size A c in pixels is calculated by the expression A c = (x c max − x c min )(y c max − y c min ) using x c min , x c max , y c min and y c max . The appropriate of the contour size is identified if the ratio A c /W c is larger than a given threshold θ c3 , where W c is the size in pixels of the CDT window. ] is obtained and compared to reference number sequences considering general human habits. Here, three types of ordering in drawing numbers were considered as reference number sequences, such as drawing numbers starting from digits 1 through 12 in ascending order, starting from digit 12, and then digits 1 through 11 in ascending order, or starting from digits 12, 3, 6, and 9 and then inserting digits 1 through 11 in ascending order. Therefore, the reference number sequences considered were S 1 = [1, 2, . . . , 9, 1, 0, 1, 1, 1, 2], S 2 = [1, 2, 1, 2, . . . , 9, 1, 0, 1, 1], and S 3 = [1, 2, 3, 6, 9, 1, 2, 4, 5, 7, 8, 1, 0, 1, 1]. The correctness of the order of the numbers is identified if the maximum value R seq = max   θ since DeepH is trained with images of the clock face with two hands so that the criteria of the two hands is included in the criteria of the one hand.

Scoring on Criteria of Numbers Parameter
Indication of the correct proportion of hands is evaluated by using the hands sensor data   (9) and (10). Then, the positioning of the numbers is identified if the radius R cN of the fitted circle F cN is smaller than the radius R cL of the fitted circle F cL . Figure 6 shows the flowchart suggested in this study for scoring the presence of two or one hand, correctness of the proportion of the hands, correctness of the hour target number, and correctness of the minute target number.

Scoring Criteria of Center Parameter
Presence or inference of the center point of the clock face in the drawing image C I is identified, as shown in Figure 4 if the presence of two or one hand is identified. Also, presence or inference of the center is identified if there is a data point within a given range from the center point ) , ( c mid c mid y x P . Table 4 lists the conditions for assigning scores for each parameter in mCDT, where the heuristic values of all the thresholds used in this study were summarized at the footnote.

Assignment of Scores
The score of the contour parameter is via the percentage fc p , the maximum contour closure distance c d max , and the ratio of the contour to the CDT window sizes c c W A / . The score of the circular contour is a 1 if the percentage fc p is greater than a given threshold  Presence of two or one hand is evaluated by the percentage p f h of the segmented image I f h matching to the corresponding portion of the clock drawing image I C . The presence of two hands is identified if the value of the percentage p f h is larger than a given threshold θ h1 . The presence of one hand is identified if the value of the percentage p f h is larger than a given threshold θ h2 . Here, the value of the given threshold θ h2 is smaller than that of the given threshold θ h1 since DeepH is trained with images of the clock face with two hands so that the criteria of the two hands is included in the criteria of the one hand.
Indication of the correct proportion of hands is evaluated by using the hands sensor data ]. Two different cases are considered here; one is that the presence of two hands is identified, and the other is that the presence of only one hand is identified. For the first case, the hour hand sensor data S h with larger data size of the two sets H 1 and H 2 is fitted into a line and then the fitted line is extrapolated within a range [y h,min , y h,max ] of y pixel coordinates, where y h,min is the minimum value of y coordinates y[n], h i ≤ n < h f in the hands sensor data and y h,max , the maximum of y coordinates y[n], c i ≤ n < c f in the contour sensor data. For the second case, the whole hands sensor data is fitted into a line and then the fitted line is extrapolated within a range [y h,min , y h,max ] of y pixel coordinates. Next, the closest point Two different cases are considered here; one is that the presence of two hands is identified, and the other is that the presence of only one hand is identified. For the first case, the minute hand sensor data S m with smaller data size of the two sets S 1 and S 2 is fitted into a line and then the fitted line is extrapolated within a range [y h,min , y h,max ] of y pixel coordinates, where y h,min is the minimum value of y coordinates y[n], h i ≤ n < h f in the hands sensor data, and y h,max is the maximum of y coordinates y[n], c i ≤ n < c f in the contour sensor data. For the second case, the whole hands sensor data is fitted into a line and then the fitted line is extrapolated within a range [y h,min , y h,max ] of y pixel coordinates. Next

Scoring Criteria of Center Parameter
Presence or inference of the center point of the clock face in the drawing image I C is identified, as shown in Figure 4 if the presence of two or one hand is identified. Also, presence or inference of the center is identified if there is a data point within a given range from the center point P(x c mid , y c mid ). Table 4 lists the conditions for assigning scores for each parameter in mCDT, where the heuristic values of all the thresholds used in this study were summarized at the footnote. Table 4. Details for assignment of scores.

Parameters Criteria Conditions (Scoring Method) *
Contour circular contour
The score of the contour parameter is via the percentage p f c , the maximum contour closure distance d c max , and the ratio of the contour to the CDT window sizes A c /W c . The score of the circular contour is a 1 if the percentage p f c is greater than a given threshold θ c1 ; the score of the closure contour is a 1 if the maximum contour closure distance d c max is greater than a given threshold θ c2 ; the score of the appropriate sized contour is a 1 if the ratio of the contour to the CDT window sizes A c /W c is greater than a given threshold θ c3 .
The score of the numbers parameter is determined by the contour sensor data, x[n] and  θ . Therefore, the total score of the contour parameter was evaluated to be two. Figure 7c shows an example of an original drawing of an appropriately sized, but neither closed nor circular, contour. The segmented image has the estimated percentage

Scoring on Criteria of Number Parameter
In  were five and two, respectively. Therefore, the presence of all the numbers and no additional numbers was evaluated to be one. The maximum ratio seq R in percentage between the number sequence N S and the reference se- , was evaluated to be 100.00%. Therefore, the correctness of the order of the numbers was evaluated to be one, since the value 100.00 of the maximum ratio seq R was greater than 65.00, the heuristically given value of n   In Figure 7a where the original image is of a closed circular contour sized appropriately, the segmented image has the estimated percentage p f c of 95.74%, the maximum contour closure distance has the evaluated pixel value d c max of 9.67 pixels, and the ratio of the contour to the CDT window size has the estimated value A c /W c of 0.339. Both the closure and circular of the contour were evaluated to be one, as p f c was greater than the 75.00 score for θ c1 , a threshold heuristically set, and d c max was less than the 50.00 score for θ c2 , a threshold heuristically set. The size of the contour was also evaluated to be one, as A c /W c was greater than the 0.1 score for θ c3 , a threshold heuristically set. Therefore, the total score of the contour parameter was evaluated to be three. Figure 7b has an original drawing image of a circular contour sized appropriately, but not wholly closed. The segmented image has the estimated percentage p f c of 89.66%, the maximum contour closure distance has the evaluated pixel value d c max of 55.56 pixels, and the ratio of the contour to the CDT window sizes has the estimated value A c /W c of 0.337. The closure of the contour was evaluated to be zero, as d c max was greater than the 50.00 score for θ c2 ; however, both the circular and the size were gauged to be one, as the estimated percentage p f c was greater than the 75.00 score for θ c1 and A c /W c was greater than the 0.1 score for θ c3 . Therefore, the total score of the contour parameter was evaluated to be two. Figure 7c shows an example of an original drawing of an appropriately sized, but neither closed nor circular, contour. The segmented image has the estimated percentage p f c of 52.31%, the maximum contour closure distance has the evaluated pixel value d c max of 51.56 pixels, and the ratio of the contour to the CDT window sizes has the estimated value A c /W c of 0.237. Both the closure and circular of the contour were evaluated to be zero, as p f c was not greater than the 75.00 score for θ c1 and d c max was greater than the 50.00 score for θ c2 ; however, the size was gauged to be one, as A c /W c was greater than the 0.1 score for θ c3 . Therefore, the total score of the contour parameter was evaluated to be one.
Finally, the original drawing image of Figure 7d depicts an example of a closed circular contour, but not sized appropriately. The segmented image has the estimated percentage p f c of 97.44%, the maximum contour closure distance has the evaluated pixel value d c max of 32.01 pixels, and the ratio of the contour to the CDT window sizes has the estimated value A c /W c of 0.061. Both the closure and circular of the contour were evaluated to be one, as p f c was greater than the 75.00 score for θ c1 and d c max was less than the 50.00 score for θ c2 ; however, the size was gauged to be zero, as A c /W c was less than the 0.1 score for θ c3 . The total score of the contour parameter was evaluated to be.  was obtained to be 446.3 pixels. Now, the positioning of the numbers within the contour was evaluated to be one, since the radius cN R of the fitted circle cN F is smaller than the radius cL R of the fitted circle cL F . Finally, the total score of the numbers parameter was evaluated to be one.

Scoring on Criteria of Hand Parameter
The analytical ability of the pre-trained model DeepH is demonstrated in Figure 9   , j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D[j] ≤ 9 for all j, and the numbers n(D[j] = 1) and n(D[j] = 2) were five and two, respectively. Therefore, the presence of all the numbers and no additional numbers was evaluated to be one. The maximum ratio R seq in percentage between the number sequence S N and the reference sequences S i i = 1, 2, 3, was evaluated to be 100.00%. Therefore, the correctness of the order of the numbers was evaluated to be one, since the value 100.00 of the maximum ratio R seq was greater than 65.00, the heuristically given value of θ n . The ratio n(d cn [j] ≤ dc )/n(d cn [j]) of the distances d cn [j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit dc , was greater than 0.65, a heuristically given value θ dc , where the evaluated distance d cn [j] for each of the digit numbers k in D[j] in pixels was specifically 46.0, 66.9, 27.5, 32.0, 33.0, 16.6, 119.0, 123.0, 38.0, 51.1, 40.7 and 22.6. Therefore, the correctness of the position of the numbers was evaluated to be one. The radius R cL of the fitted circle F cL to the contour sensor data was obtained to be 454.7 pixels. The radius R cN of the fitted circle F cN to the center point P(x m [j], y m [j]), j = 1, 2, . . . , 12 of each of the cropped number images I j C , j = 1, 2, . . . , N was obtained to be 351.6 pixels. Now, the positioning of the numbers within the contour was evaluated to be one, since the radius R cN of the fitted circle F cN is smaller than the radius R cL of the fitted circle F cL . Finally, the total score of the numbers parameter was evaluated to be four. Figure 8b displays the case in which all the numbers without any additional numbers are present in the correct orders within the contour but not in proper positions. The classified output D[j], j = 1, 2, . . . , N of DeepN, the pretrained model, was identified to have no missing values, since the total number N was equal to 15, and the classified output D[j], j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D[j] ≤ 9 for all j, and the numbers n(D[j] = 1) and n(D[j] = 2) were five and two, respectively. Therefore, the presence of all the numbers and no additional numbers was evaluated to be one. The maximum ratio R seq in percentage between the number sequence S N and the reference sequences S i i = 1, 2, 3, was evaluated to be 86.66%. Therefore, the correctness of the order of the numbers was evaluated to be one, since the value 86.66 of the maximum ratio R seq was greater than 65.00, the heuristically given value of θ n . The ratio n(d cn [j] ≤ dc )/n(d cn [j]) of the distances d cn [j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit dc , was less than 0.65, a heuristically given value θ dc , where the evaluated distance d cn [j] in pixels for each of the digit numbers k in D[j] was specifically 243.0, 367.8, 366.93, 594.4, 518.2, 398.0, 491.1, 365.9, 418.2, 404.0, 628.2 and 143.6. Therefore, the correctness of the position of the numbers was evaluated to be zero. The radius R cL of the fitted circle F cL to the contour sensor data was obtained to be 447.6 pixels. The radius R cN of the fitted circle F cN to the center point P(x m [j], y m [j]), j = 1, 2, . . . , 12 of each of the cropped number images I j C , j = 1, 2, . . . , N was obtained to be 338.1 pixels. Now, the positioning of the numbers within the contour was evaluated to be one, since the radius R cN of the fitted circle F cN is smaller than the radius R cL of the fitted circle F cL . Finally, the total score of the numbers parameter was evaluated to be three. Figure 8c displays the case in which all the numbers without any additional numbers are present in the correct orders, but not in the proper positions nor within the contour. The classified output D[j], j = 1, 2, . . . , N of DeepN, the pretrained model, was identified to have no missing values, since the total number N was equal to 15, and the classified output D[j], j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D[j] ≤ 9 for all j, and the numbers n(D[j] = 1) and n(D[j] = 2) were five and two, respectively. Therefore, the presence of all the numbers and no additional numbers was evaluated to be one. The maximum ratio R seq in percentage between the number sequence S N and the reference sequences S i i = 1, 2, 3, was evaluated to be 93.33%. Therefore, the correctness of the order of the numbers was evaluated to be one, since the value 93.33 of the maximum ratio R seq was greater than 65.00, the heuristically given value of θ n . The ratio n(d cn [j] ≤ dc )/n(d cn [j]) of the distances d cn [j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit dc , was less than 0.65, a heuristically given value θ dc , where the evaluated distance d cn [j] for each of the  6. Therefore, the correctness of the position of the numbers was evaluated to be zero. The radius R cL of the fitted circle F cL to the contour sensor data was obtained to be 538.5 pixels. The radius R cN of the fitted circle F cN to the center point P(x m [j], y m [j]), j = 1, 2, . . . , 12 of each of the cropped number images I j C , j = 1, 2, . . . , N was obtained to be 694.2 pixels. Now, the positioning of the numbers within the contour was evaluated to be zero, since the radius R cN of the fitted circle F cN is larger than the radius R cL of the fitted circle F cL . Finally, the total score of the numbers parameter was evaluated to be two. Figure 8d displays the case in which all the numbers without any additional numbers are present in the correct orders within the contour, but not in the proper positions. The classified output D[j], j = 1, 2, . . . , N of DeepN, the pretrained model, was identified to have no missing values, since the total number N was equal to 15, and the classified output D[j], j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D[j] ≤ 9 for all j, and the numbers n(D[j] = 1) and n(D[j] = 2) were five and two, respectively. Therefore, the presence of all the numbers and no additional numbers was evaluated to be one. The maximum ratio R seq in percentage between the number sequence S N and the reference sequences S i i = 1, 2, 3, was evaluated to be 100.00%. Therefore, the correctness of the order of the numbers was evaluated to be one, since the value 100.00 of the maximum ratio R seq was greater than 65.00, the heuristically given value of θ n . The ratio n(d cn [j] ≤ dc )/n(d cn [j]) of the distances d cn [j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit dc , was less than 0.65, a heuristically given value θ dc , where the evaluated distance d cn [j] for each of the digit numbers k in D[j] was specifically 618. 19, 1106.19, 1408.42, 931.44, 378.00, 185.95, 630.61, 1079.10, 1381.07, 1314.39, 720.95 and 58.69. Therefore, the correctness of the position of the numbers was evaluated to be zero. The radius R cL of the fitted circle F cL to the contour sensor data was obtained to be 554.8 pixels. The radius R cN of the fitted circle F cN to the center point P(x m [j], y m [j]), j = 1, 2, . . . , 12 of each of the cropped number images I j C , j = 1, 2, . . . , N was obtained to be 330.8 pixels. Now, the positioning of the numbers within the contour was evaluated to be one, since the radius R cN of the fitted circle F cN is smaller than the radius R cL of the fitted circle F cL . Finally, the total score of the numbers parameter was evaluated to be three. Figure 8e displays the case in which some numbers are missing and the presented numbers are not in proper positions, but mostly in correct order within the contour. The classified output D[j], j = 1, 2, . . . , N of DeepN, the pretrained model, was identified to have some missing values, since the total number N was equal to 13, and the classified output D[j], j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D[j] ≤ 9 for all j, and the numbers n(D[j] = 1) and n(D[j] = 2) were four and two, respectively. Therefore, the presence of all the numbers and no additional numbers was evaluated to be zero. The maximum ratio R seq in percentage between the number sequence S N and the reference sequences S i i = 1, 2, 3, was evaluated to be 85.71%. Therefore, the correctness of the order of the numbers was evaluated to be one, since the value 85.71 of the maximum ratio R seq was greater than 65.00, the heuristically given value of θ n . The ratio n(d cn [j] ≤ dc )/n(d cn [j]) of the distances d cn [j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit dc , was less than 0.65, a heuristically given value θ dc , where the evaluated distance d cn [j] for each of the digit numbers k in D[j] was specifically 497.8, 462.5, 350.7, 399.3, 415.1, 254.5, 208.5, 1037.8, 836.2, 792.1, 743.6 and 952.1. Therefore, the correctness of the position of the numbers was evaluated to be zero. The radius R cL of the fitted circle F cL to the contour sensor data was obtained to be 319.8 pixels. The radius R cN of the fitted circle F cN to the center point P(x m [j], y m [j]), j = 1, 2, . . . , 12 of each of the cropped number images I j C , j = 1, 2, . . . , N was obtained to be 193.2 pixels. Now, the positioning of the numbers within the contour was evaluated to be one, since the radius R cN of the fitted circle F cN is smaller than the radius R cL of the fitted circle F cL . Finally, the total score of the numbers parameter was evaluated to be two. Figure 8f displays the case in which there are additional numbers not belonging to a clock but the numbers are in correct orders within the contour. The classified output D[j], j = 1, 2, . . . , N of DeepN the pretrained model was identified to have some additional numbers, since the total number N was equal to 42, and the classified output D[j], j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D[j] ≤ 9 for all j, and the numbers n(D[j] = 1) and n(D[j] = 2) was thirteen and nine, respectively. Therefore, the presence of all the numbers and no additional numbers was evaluated to be zero. The maximum ratio R seq in percentage between the number sequence S N and the reference sequences S i i = 1, 2, 3, were evaluated to be 83.57%. Therefore, the correctness of the order of the numbers was evaluated to be one, since the value 83.57 of the maximum ratio R seq was greater than 65.00, the heuristically given value of θ n . The ratio n(d cn [j] ≤ dc )/n(d cn [j]) of the distances d cn [j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit dc , was less than 0.65, a heuristically given value θ dc , where the evaluated distance d cn [j] for each of the digit numbers k in D[j] was specifically 376. 5 7. Therefore, the correctness of the position of the numbers was evaluated to be zero. The radius R cL of the fitted circle F cL to the contour sensor data was obtained to be 694.2 pixels. The radius R cN of the fitted circle F cN to the center point P(x m [j], y m [j]), j = 1, 2, . . . , 12 of each of the cropped number images I j C , j = 1, 2, . . . , N was obtained to be 446.1 pixels. Now, the positioning of the numbers within the contour was evaluated to be one, since the radius R cN of the fitted circle F cN was smaller than the radius R cL of the fitted circle F cL . Finally, the total score of the numbers parameter was evaluated to be two. Figure 8g displays the case in which some numbers are missing and the presented numbers are not in proper positions, but mostly in correct order within the contour. The classified output D[j], j = 1, 2, . . . , N of DeepN, the pretrained model, was identified to have some additional numbers, since the total number N was equal to 11, and the classified output D[j], j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D[j] ≤ 9 for all j, and the numbers n(D[j] = 1) and n(D[j] = 2) were two and one, respectively. Therefore, the presence of all the numbers and no additional numbers was evaluated to be zero. The maximum ratio R seq in percentage between the number sequence S N and the reference sequences S i i = 1, 2, 3, were evaluated to be 66.66%. Therefore, the correctness of the order of the numbers was evaluated to be one, since the value 66.66 of the maximum ratio R seq was greater than 65.00, the heuristically given value of θ n . The ratio n(d cn [j] ≤ dc )/n(d cn [j]) of the distances d cn [j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit dc , was less than 0.65, a heuristically given value θ dc , where the evaluated distance d cn [j] for each of the digit numbers k in D[j] was specifically 87.7, 68.9, 56.1, 78.0, 163.0, 190.1, 232.8, 265.3, 894.3, 860.6, 802.5 and 990.8. Therefore, the correctness of the position of the numbers was evaluated to be zero. The radius R cL of the fitted circle F cL to the contour sensor data was obtained to be 242.4 pixels. The radius R cN of the fitted circle F cN to the center point P(x m [j], y m [j]), j = 1, 2, . . . , 12 of each of the cropped number images I j C , j = 1, 2, . . . , N was obtained to be 149.9 pixels. Now, the positioning of the numbers within the contour was evaluated to be one, since the radius R cN of the fitted circle F cN is smaller than the radius R cL of the fitted circle F cL . Finally, the total score of the numbers parameter was evaluated to be two. Figure 8h displays the case in which many numbers are missing and the presented numbers are not in proper positions and correct order, but within the contour. The classified output D[j], j = 1, 2, . . . , N of DeepN the pretrained model was identified to have some missing numbers, since the total number N is equal to five, and the classified output D[j], j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D[j] ≤ 9 for all j, and the numbers n(D[j] = 1) and n(D[j] = 2) were one and two, respectively. Therefore, the presence of all the numbers and no additional numbers was evaluated to be zero. The maximum ratio R seq in percentage between the number sequence S N and the reference sequences, S i i = 1, 2, 3, were evaluated to be 42.10%. Therefore, the correctness of the order of the numbers was evaluated to be zero since the value 42.10 of the maximum ratio R seq was less than 65.00, the heuristically given value of θ n . The ratio n(d cn [j] ≤ dc )/n(d cn [j])  1 and 153.0. Therefore, the correctness of the position of the numbers was evaluated to be zero. The radius R cL of the fitted circle F cL to the contour sensor data was obtained to be 601.1 pixels. The radius R cN of the fitted circle F cN to the center point P(x m [j], y m [j]), j = 1, 2, . . . , 15 of each of the cropped number images I j C , j = 1, 2, . . . , 15 was obtained to be 446.3 pixels. Now, the positioning of the numbers within the contour was evaluated to be one, since the radius R cN of the fitted circle F cN is smaller than the radius R cL of the fitted circle F cL . Finally, the total score of the numbers parameter was evaluated to be one.

Scoring on Criteria of Hand Parameter
The analytical ability of the pre-trained model DeepH is demonstrated in Figure 9 parameter was evaluated to be three. Figure 9f displays the case in which only one hand is present with the proper targ number. In this case, the percentage fh p was evaluated to be 64.4% less than 65.0%, t score for − was estimated to be 1888.6 pixels greater than 200 pixels, the score for ε . Therefore, the correctness of the hand target number was eval ated to be one, and the correctness of the minute target number was also evaluated to one. Finally, the total score of the hands parameter was evaluated to be two. Figure 9g displays the case in which only one hand is present with neither the prop proportions nor the target numbers. In this case, the percentage fh p was evaluated to 63.7% less than 65.0%, the score for were es mated to be 229.63 greater than 200.0 pixels, the score for ε . Therefore, the correctness hand target number was evaluated to be zero. Finally, the total score of the hands param eter was evaluated to be one. Figure 9h displays the case in which no hands are present. Therefore, the total sco of the hands parameter was evaluated to be zero.  Figure 9a displays the case in which two hands are present with the proper proportions and target numbers. In this case, the percentage p f h was evaluated to be 100.00% greater than 65% the score for θ h1 . Therefore, the presence of two hands was scored to be two. The length difference ∆ h was evaluated to be 89.4 pixels greater than 30.0 pixels the score for θ pr . Therefore, the correct proportion of the two hands was also scored to be one. Both of the distances abs(P( ) were estimated to be 123.9 and 86.6 pixels less than 200.0 pixels, the score for ε, respectively. Therefore, both the correctness of hand and minute target numbers were evaluated to be one. Finally, the total score of the hands parameter was evaluated to be five. Figure 9b displays another example of the case in which two hands are present with the proper proportions and target numbers, where one of the target numbers is not in the proper position. In this case, the percentage p f h was evaluated to be 73.37% greater than 65% the score for θ h1 . Therefore, the presence of two hands was scored to be two. The length difference ∆ h was evaluated to be 219.3 pixels greater than 30.0 pixels, the score for θ pr . Therefore, the correct proportion of the two hands was also scored to be one. Both of the distances abs(P( ) were estimated to be 57.7 and 44.5 pixels less than 200.0 pixels, the score for ε, respectively. Therefore, both the correctness of hand and minute target numbers were evaluated to be one. Finally, the total score of the hands parameter was evaluated to be five. Figure 9c displays the case in which two hands are present with the proper proportions, but one of them is not indicating the target number. In this case, the percentage p f h was evaluated to be 65.35% greater than 65%, the score for θ h1 . Therefore, the presence of two hands was scored to be two. The length difference ∆ h was evaluated to be 101.7 pixels less than 30.0 pixels, the score for θ pr . Therefore, the correct proportion of the two hands was also scored to be one. The distance abs(P(x[n h ], y[n h ]) − P(x[h t ], y[h t ])) was estimated to be 110.7 pixels less than 200.0 pixels, the score for ε, but the distance abs(P(x[n m ], y[n m ]) − P(x[m t ], y[m t ])) was estimated to be 292.0 pixels greater than 200.0 pixels, the score for ε. Therefore, the correctness of the hand target number was evaluated to be one, but the correctness of the minute target number was also evaluated to be one. Finally, the total score of the hands parameter was evaluated to be four. Figure 9d displays the case in which two hands are present with the proper target numbers but not the proper proportions. In this case, the percentage p f h was evaluated to be 89.8% greater than 65.0%, the score for θ h1 . Therefore, the presence of two hands was scored to be two. The length difference ∆ h was evaluated to be 63.9 pixels less than 30.0 pixels, the score for θ pr . Therefore, the correct proportion of the two hands was also scored to be zero. Both of the distances abs(P(x[n h ], y[n h ]) − P(x[h t ], y[h t ])) and abs(P(x[n m ], y[n m ]) − P(x[m t ], y[m t ])) were estimated to be 60.5 and 109.5 pixels less than 200.0 pixels, the score for ε, respectively. Therefore, both the correctness of the hand and minute target numbers were evaluated to be one. Finally, the total score of the hands parameter was evaluated to be four. Figure 9e displays the case in which two hands are present with the proper proportions but not the proper target numbers. In this case, the percentage p f h was evaluated to be 91.1% greater than 65.0%, the score for θ h1 . Therefore, the presence of two hands was scored to be two. The length difference ∆ h was evaluated to be 37.80 pixels greater than 30.0 pixels, the score for θ pr . Therefore, the correct proportion of the two hands was also scored to be one. Both of the distances abs(P( ) and abs(P(x[n m ], y[n m ]) − P(x[m t ], y[m t ])) were estimated to be 610.1 and 540.1 pixels greater than 200.0 pixels, the score for ε, respectively. Therefore, both the correctness of hand and minute target numbers were evaluated to be zero. Finally, the total score of the hands parameter was evaluated to be three. Figure 9f displays the case in which only one hand is present with the proper target number. In this case, the percentage p f h was evaluated to be 64.4% less than 65.0%, the score for θ h1 , and greater than 50.0%, the score for θ h2 . Therefore, the presence of two hands was scored to be one. The distance abs(P(x[n h ], y[n h ]) − P(x[h t ], y[h t ])) was estimated to be 133.6 pixels less than 200.0 pixels, the score for ε but the distance abs(P(x[n m ], y[n m ]) − P(x[m t ], y[m t ])) was estimated to be 1888.6 pixels greater than 200.0 pixels, the score for ε. Therefore, the correctness of the hand target number was evaluated to be one, and the correctness of the minute target number was also evaluated to be one. Finally, the total score of the hands parameter was evaluated to be two. Figure 9g displays the case in which only one hand is present with neither the proper proportions nor the target numbers. In this case, the percentage p f h was evaluated to be 63.7% less than 65.0%, the score for θ h1 and greater than 50.0%, the score for θ h2 . Therefore, the presence of two hands was scored to be one. Both of the distances ) were estimated to be 229.63 greater than 200.0 pixels, the score for ε. Therefore, the correctness of hand target number was evaluated to be zero. Finally, the total score of the hands parameter was evaluated to be one. Figure 9h displays the case in which no hands are present. Therefore, the total score of the hands parameter was evaluated to be zero.

Scoring on Criteria of Center Parameter
For presence or inference of a center in the hand drawn images, Figure 10 represents four cases of three having center points detected or inferred and the other having no center point. In each of the cases, the original clock drawing image example (left) and the corresponding parameter values (right) are presented. Figure 10a displays the case in which two hands are present so the center is inferred. In this case, the percentage p f h was evaluated to be 100.00% greater than 65.0%, the score for θ h1 . Therefore, the presence of a center point was evaluated to be one. Figure 10b displays the case in which only one hand is present so the center is inferred. In this case, the percentage p f h was evaluated to be 64.4% less than 65.0%, the score for θ h1 , and greater than 50.0%, the score for θ h2 . This case also was evaluated to be one for the presence of a center point. Figure 10c displays the case in which no hands are present, but a data point exists near to the center of the contour. Here, the number of the sensor data points P(x[n], y[n]) with distance abs(P(x[n], y[n]) − P(x c mid , y c mid )), less than 75.0 pixels, the given heuristic value of ε c , from the predefined center point P(x c mid , y c mid ), which was evaluated to be 94. The presence of a center point was inferred and therefore evaluated to be one. Figure 10d shows the case in which no center point is present or inferred. In this case, there are no hands and the number of the sensor data points P(x[n], y[n]) with distance abs(P(x[n], y[n]) − P(x c mid , y c mid )) less than 75.0 pixels, the given heuristic value of ε c , from the predefined center point P(x c mid , y c mid ), which was evaluated to be zero. Therefore, the presence or the inference of a center was evaluated to be zero.
94. The presence of a center point was inferred and therefore evaluated to be one. Figure 10d shows the case in which no center point is present or inferred. In this cas there are no hands and the number of the sensor data points , which was evaluated to be zero. There fore, the presence or the inference of a center was evaluated to be zero.  (b) the case in which only one hand is present so the center is inferred; (c) the case in which no hands are present but a data point exists near to the center of the contour; and (d) the case in which no center point is present or inferred. * Total score/score of contour/score of numbers/score of hands/score of center.

Performance Test Result
A total of 219 drawing images were used to test the performance of the scoring method by mCDT. Table 5 summarizes the frequency of the ground truth for the 219 images with the score in each of the parameters. For the parameter contour, the frequencies were 217, 178, and 215, with an error in estimation of 6, 13, and 1, respectively for the criteria of the circular contour, closure contour, and appropriately sized contour. For the parameter numbers, the frequencies were 153, 181, 88, and 202 with an error in estimation of 11, 5, 2, and 2, respectively, for the criteria of all the numbers present without additional numbers, numbers in corrected order, numbers in correct positions, and numbers within the contour. For the parameter hands, the frequencies were 171, 181, 170, 153, and 149 with an error in estimation of 13, 6, 13, 1, and 6, respectively, for the criteria of presence of two hands, presence of one hand, correct proportion of two hands, hour target number indication and minute target number indication. For the parameter center, the frequency was 190 with an error in estimation of three for the criteria of presence or inference of a center. Tables 6 and 7 list the distribution of the estimated scores and the performance of each scoring parameter, respectively in total as well as in the two separate groups, young volunteers and PD patients. As shown in Table 7, for the parameter contour, sensitivity, specificity, accuracy and precision, values were 89.33%, 92.68%, 89.95% and 98.15%; for numbers, they were 80.21%, 95.93%, 89.04% and 93.90%; for hands, they were 83.87%, 95.31%, 87.21% and 97.74%; and for center, they were 98.42%, 86.21%, 96.80% and 97.91%, respectively. Table 6. Distribution of the estimated scores for each scoring parameters in mCDT.  Table 7. Performance of the scoring parameters in mCDT.

Contour
Numbers Hands Center TP 159 (70/89) multiple studies have indicated that a number of brain regions are recruited for the tasks required; these include the temporal lobes, frontal and parietal lobes in addition to the cerebellum, thalamus, premotor area and inferior temporal sulcus, the bilateral parietal lobe, and the sensorimotor cortex [39,40]. What is not clearly known are which portions of the cognitive function are required for recruiting these areas, as with the conventional CDT such an association and a quantitation would be difficult to accomplish. Our study sought to address this requirement from the CDT, and by introducing mCDT as a mobile phone application with a qualitative, automatic scoring system of CDT, this may have been realized. As elaborated previously, the mCDT scoring system was constructed using CNN, a convolutional network for digit classification, U-Net, a convolutional network for biomedical image segmentation, and the MNIST database, the Modified National Institute of Standards and Technology database. The sensor data is also collected by mCDT. From the performance test results, the scoring algorithm in mCDT is efficient and accurate when compared with those of the traditional CDT. In addition, mCDT is able to evaluate the relevant components of the cognitive function. The subjects in our study carried out the drawings with a smart pen on a smartphone screen when required to reproduce figures in a setting similar to the conventional CDT using a pen and paper. This method also allows for increased accuracy in gauging the drawing process and also minimizing any noise in an assay for activated brain function. The smartphone could also provide the motor-related markers of speed and pausing as the test is being carried out; in a conventional CDT pencil and paper test, such motor ability function cognitive tools may not be easily implemented. In summary, our study introduces the developed mCDT as a tool for increasing the accuracy required for the cognitive function evaluation in CDT. As described in the performance test results, mCDT showed fairly good statistical indicators, especially excellent values in specificity and precision. Furthermore, the values of specificity and precision for PD patient groups were better than those for the young volunteer group, which suggested that mCDT does classify the two groups very well and consistently so that it is applicable as a diagnostic tool in neurological disease group and also as a correlation tool between the scores of each criteria and the regional functions of the degenerated brain. Of course, the ability as a correlation tool needs to be investigated in future work, some preliminary studies of which is ongoing, with several clinical groups in collaboration with primary care physicians and neurology subspecialists. Furthermore, since the presented CDT scoring method here uses sensor data collected from a smart mobile device and deep learning based algorithm in the CDT image segmentation and processing, other stroke behavior patterns due to neurological disease symptoms such as motor, memory, and cognitive disorders could be additionally extracted using stroke speed variation and touch event sequence patterns that could be estimated from the sensor data; even the CDT scoring is limited to four parameters with thirteen criteria.

Conclusions
In this study, a mobile phone application mCDT for the CDT was implemented, and also an automatic and qualitative scoring method in thirteen criteria was developed using mobile sensor data and deep learning algorithms, U-Net and CNN. A young healthy volunteer (n = 238, 147 males and 89 females, aged 23.98 ± 2.83 years) and a PD patient (n = 140, 76 males and 64 females, aged 75.09 ± 8.57 years) group were recruited and participated in the training models DeepC, DeepH and DeepN, and in validating the performance of the CDT scoring algorithm. Most of the resulting overall statistical indicators, sensitivity, specificity, accuracy, and precision greater than 85%, were acquired at the performance validation of the 79 young healthy volunteers and 140 PD patients. Two exceptions were recognized at the sensitivities of the number and the hands parameters. Especially, the specificities of the contour and hand parameter of the young volunteer group were shown to be far too low (60.00% and 66.67%, respectively), which was because the number of true negatives and false positives were a lot smaller, as well as they were in relatively similar proportions. Furthermore, the specificities and the precisions of the PD patients group were better than those of the young volunteer group, which suggests that the mCDT along with the scoring method is available to be used as a tool of classifying neurological disease groups and also as a tool of scaling the disease symptoms related to degenerated regions of the brain. Further clinical studies should be established in differentiating neurological disease subtypes, being valuable in clinical practice and for studies in the field. Institutional Review Board Statement: The Institutional Review Board of the Hallym University Sacred Heart Hospital approved the gathering data and the protocols used for this study (IRB number: 2019-03-001).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy issue.

Conflicts of Interest:
The authors declare no conflict of interest.