Automated Bone Age Assessment with Image Registration Using Hand X-ray Images

: One of the methods for identifying growth disorder is by assessing the skeletal bone age. A child with a healthy growth rate will have approximately the same chronological and bone ages. It is important to detect any growth disorder as early as possible, so that mitigation treatment can be administered with less negative consequences. Recently, the most popular approach in assessing the discrepancy between bone and chronological ages is through the subjective protocol of Tanner–Whitehouse that assesses selected regions in the hand X-ray images. This approach relies heavily on the medical personnel experience, which produces a high intra-observer bias. Therefore, an automated bone age prediction system with image registration using hand X-ray images is proposed in order to complement the inexperienced doctors by providing the second opinion. The system relies on an optimized regression network using a novel residual separable convolution model. The regressor network requires an input image to be 299 × 299 pixels, which will be mapped to the predicted bone age through three modules of the Xception network. Moreover, the images will be pre-processed or registered ﬁrst to a standardized and normalized pose using separable convolutional neural networks. Three steps image registration are performed by segmenting the hand regions, which will be rotated using angle calculated from four keypoints of interest, before positional alignment is applied to ensure the region of interest is located in the middle. The hand segmentation is based on DeepLab V3 plus architecture, while keypoints regressor for angle alignment is based on MobileNet V1 architecture, where both of them use separable convolution as the core operators. To avoid the pitfall of underﬁtting, synthetic data are generated while using various rotation angles, zooming factors, and shearing images in order to augment the training dataset. The experimental results show that the proposed method returns the lowest mean absolute error and mean squared error of 8.200 months and 121.902 months 2 , respectively. Hence, an error of less than one year is acceptable in predicting the bone age, which can serve as a good supplement tool for providing the second expert opinion. This work does not consider gender information, which is crucial in making a better prediction, as the male and female bone structures are naturally different.


Introduction
It is important to identify growth disorder as early as possible, as it is normally treatable in the early stage, such as in the case of Lionel Messi, the world's best footballer [1]. Through systematic hormonal therapy, he manages to avoid the short stature problem, which is a critical factor in being a successful athlete. One of the methods used by the pediatricians to assess the growth level of a child is by using skeletal bone age assessment [2]. This test is used to identify any growth disorder, in which the skeletal bones might be overdeveloped or underdeveloped with regards to the child chronological age. The primary cause of this disorder can be attributed to the lack of nutrition, genetics-related disease, or problem in hormonal discretions [3]. Generally, bone age assessment is diagnosed through a radiography X-ray image of the left-hand region, which is the non-dominant hand. X-ray is a powerful imaging modality where it is even used in astronomy-related observations [4]. The age differences can be observed through an X-ray image, especially regions with the bone growth plates, where they will become thinner as a child grows older and totally disappears once he becomes an adult. In early 1960, the most popular assessment is done through the Greulich and Pyle (G&P) method [5], which subjectively compares the captured hand X-ray with a set of hand atlas images. In general, the reliability of this method is low due to the high bias in inter and intra-observers, where clinical experience plays a crucial factor in reporting the right assessment. Rather than looking at the whole X-ray image, the method by Tanner-Whitehouse (TW) [6] is looking at the specific regions on the X-ray image in order to reduce the assessment subjectivity [7]. Figure 1 shows the regions of interest of the TW method, where the primary areas are the epiphysis and metaphysis of carpal and phalanges bones. To be specific, only phalanges from three fingers are analyzed, which are thumb, middle and pinky fingers. The final bone age assessment is calculated through scores summation of all regions of interest. Even though the observer bias is reduced through observing dedicated regions using the TW method, it still requires extensive experience for the doctors to accurately predict the bone age. Hence, a computer-assisted system will be a good supplement tool in helping them to do the prediction. In the early development of automated bone age assessment systems, conventional machine learning approaches such as neural networks (NN) [8] and support vector regression (SVR) [9] have been extensively used to predict the bone age while using the handcrafted features. These types of features are manually engineered, which are not optimally designed with regard to the problem. Hence, a deep learning approach has been proposed in [10] to optimally learn the unique features, which are then utilized for the skeletal bone age prediction based on hand X-ray images. The deep learning approach mainly uses convolutional neural networks to extract the spatial information from various filters of various network levels, which is then classified or regressed while using dense neural network layers. This architecture has been successfully implemented in many biomedical applications that include eye disease detection [11], Alzheimer diagnosis [12], COVID-19 screening [13], physiotherapy [14], and cardiac analysis [15]. However, the input images will come in various sizes and conditions, where some images will be relatively small for the newborn baby and vice versa for the late teen case. Moreover, there is no standardization in the hand pose while capturing the X-ray images, where the hand might tilt at certain angles.
Therefore, we propose an image registration approach while using a composite function of separable convolutional neural network-based segmentation and keypoints detector to realign the images into a standard representation. The main advantage of the proposed approach is the lightweight nature of the networks that use separable convolution as the core operator, where the computational burden is lesser when compared to the full convolution architecture. The hand segmentation network is based on DeepLab V3 plus [16] and the keypoints regressor network is based on MobileNet V1 [17]. In the end, we apply a residual separable convolution scheme using Xception [18] network to train a regressor for an accurate bone age prediction that produces a mean absolute error value of fewer than 0.7 years. This network uses a unique three-layer residual separable convolution that reduces the probability of diminishing gradient issues during the training phase. Moreover, it is also in tune with the findings presented in [19], where the network should be designed, such that the depth and width of the filters are proportionally balanced to give the best classification or regression performance. The proposed system that utilized separable convolution extensively has been tested on X-ray images database that covers an age range of 1 to 228 months old. A large dataset allows for the network to learn better in generalizing the mapping between the X-ray images to the predicted age. Thus, several works [20,21] that are based on a small dataset will not be considered as the benchmark methods and, hence, all performance comparisons are benchmarked with the state-of-the-art deep learning networks. Data augmentation has also been applied to further improve the training process by using shearing, flipping, and contrast variation operations. Note that the proposed method does not utilize gender information in predicting the bone age, so that a more general prediction network can be produced. In some cases, gender information is not available to health practitioners due to privacy reasons. Therefore, a general network that is solely based on the X-ray image alone is more desirable. According to the report in [22], the best-performed methods have used both the X-ray images and gender information in designing their regression networks, which has also been validated on a small testing dataset of just 200 images. Contrary to this approach, this work divides the large dataset of 12,811 X-ray images into a training, validation, and testing set according to the ratio of 8:1:1, which is the standard data division strategy [23], which results in sizable testing data of 1281 images.
This work is organized into five sections. Section 2 discusses some conventional machine learning networks that have been applied to bone age assessment, followed by the convolutional neural networks approach. The radiography X-ray dataset of the hand used for training and testing the algorithm is described in Section 3. Subsequently, Section 4 explains the proposed image registration used to standardized and normalized the input X-ray images. Section 5 explains the applied Xception network regressor, which uses residual separable convolution to make an accurate prediction of the bone age. After that, Section 6 is dedicated to results discussion, where the proposed method performance is compared to the other state-of-the-art deep learning models. Concise conclusions and suggestions for future work are given in the last section.

Related Work
A recent systematic review conducted by Dallora et al. [24] shows that many of the automated bone age assessment methods still rely on conventional machine learning methods, where 22 out of the 26 methods have applied hand-crafted features extractor as an input to a regressor model. In [25], optimal fusion rules are analyzed in order to find the best combination weights for the 17 regions of interest, where the extracted features are then passed to a least-square classifier. Rather than using a single method to extract the features, the work presented in [26] combines three feature extractors, which are histogram of oriented gradients, local binary pattern, and scale-invariant feature transform, which is then fed to a support vector machine (SVM) classifier. Although the extracted features cover various unique traits of the X-ray images, the computational burden is too high. In [9], the combination of SVM and SVR are used in order to complement the training process, where cross-correlation function between the tested regions of interest is used to produce the similarity scores. On the other hand, Gertych et al. [27] utilize wavelet features coupled with an 11-class fuzzy classifier to predict the bone age. The method that is presented in [28] combines both Fuzzy and neural networks (NN) classifiers to better predict the bone age using a combination of features that were extracted from each region of interest based on TW protocol. Contrary to the work in [28], Kashif et al. [8] use a NN classifier to identify the best-handcrafted features for bone age assessment among these five keypoints detector; SIFT, SURF, BRIEF, BRISK, and FREAK. They found out that SIFT detector produces the lowest mean error when it is coupled with a NN classifier. Instead of using iterative training process in the NN classifier, the work in [29] opts to use an extreme learning machine to train a single hidden NN using Moore-Penrose generalized inverse matrices. The networks between the input and hidden nodes are randomly assigned and remain fixed, while only the network between the hidden nodes and output is trained while using a single step update.
Spampinato et al. [10] is one of the earliest works that uses convolutional neural networks (CNN) in predicting the bone age by introducing their BoNet architecture. They have experimented with various compact CNN architectures, where the best prediction is obtained using a set of five layers CNN of 96, 2048, 1024, 1024, and 1024 filters. Usually, a compact CNN network consists of three to five CNN layers at most [30], and it is usually first pre-trained in a different domain to reduce the possibility of overfitting [31]. BoNet does not apply any pre-processing step, which will face difficulty if the input images are not captured in a standard orientation and scale. Therefore, the work in [32] has applied U-Net architecture to segment the hand regions from the background. The VGG-16 network is then used to perform regression on the bone age assessment. They have assessed three variants of regions of interest, which are the whole X-ray image, carpal region, and phalanges regions, where the whole X-ray image mode returns the lowest mean absolute error. Similarly, the work in [33] has also applied a combination of U-Net segmentation and VGG-16 architecture for bone age prediction. However, the VGG-16 architecture is applied in the form of classifiers instead of regressors, where the network is trained for a large number of output classes (240 classes). Zhao et al. [34] then improves on the network training process, where they use paced transfer learning where the regressor network is tuned from the top layer first until the last bottom layer. The performance of several deep learning architectures that include AlexNet, GoogleNet, and VGG-16 are analyzed in [35] by using a normalized and resized input images. They have generated synthetic data to augment their training dataset using geometric transformations, photometric transformations, noise injections, and color jittering. Mask R-CNN is used in the work by Wu et al. [36] in order to segment the hand from an X-ray image, where a residual attention network is then used to predict the bone age. It is a unique end-to-end approach, where the segmentation and regression networks are trained together, not separately. Instead of looking at the whole hand region, the work in [37] locates each of the 13 regions in the TW protocol and classifies them into one of the age categories using VGG-16 architecture. The final bone age prediction is obtained by adding up the scores from all 13 regions.

X-ray Image Dataset
The dataset used to validate the automated bone age prediction system is obtained from the Radiological Society of North America (RSNA) Pediatric Bone Age Machine Learning Challenge [22]. The total number of hand X-ray images is 14,236, which has been arranged into three classes of training, validation, and testing sets with 12,611, 1425, and 200 images for each respective class. All of the X-ray images are stored in the Portable Network Graphic format, where the highest resolution is 2460 × 2970 pixels, while the lowest resolution is 800 × 1011 pixels. The dataset was collected from two children hospitals, which are Children's Hospital Colorado and Lucile Packard Children's Hospital in the United States of America. The age range for the dataset starts from one month old until 228 months old. The number of male samples is slightly higher as compared to the female samples where male samples make up around 54.13% out of the total samples. Gender information is annotated for each of the X-ray image samples, but we omit this information in our research, so that a more general regressor can be developed. Six medical practitioners have been tasked to annotate the ground truth age, where the weighted mean is used to calculate the final age values. This RSNA dataset poses various challenges for an automated system because of the heterogeneous contrast values and non-standard pose images, as shown in Figure 2. Furthermore, some sample images are also much smaller as compared to the others, as the size is not normalized beforehand. Because our proposed method utilizes image registration to standardize the X-ray images, the original validation dataset will be omitted as it is used to create the training set for our hand segmentation and four keypoints detector networks. Thus, the total number of X-ray images that are utilized in our experiment is reduced to 12,811 images, which is then divided into three classes of training, validation, and testing sets according to the ratio of 8:1:1, respectively. The original 200 testing images, as proposed by the RSNA challenge, are not ideal for performance comparison, as it is too small when compared to the total number of images in the training dataset.

Image Registration
Image registration is the transformation process to geometrically align all the images into a standardized form [38], where the segmented hand region will be positioned in the middle of the image with an upright position, where the angle between the middle finger and the horizontal axis is 90 • . It is an important step so that a regressor can be trained using a standardized pose which will put more emphasis on the age differences, rather than focusing on the variations of the X-ray images. In this work, three operations are performed in order to register the image into a standardized representation, which are segmentation, rotation, and alignment, as shown in Figure 3. Let define a set of input images, X to be {x 1 , . . . , x |X| } ∈ R 4 , where an i image of X i ∈ R 3 is a vector comprises of its width (w i ), height (h i ), and channel (c i ) information. One of the novelties of this work is the application of separable convolutional neural networks in performing the image segmentation and rotation. It is a factorized form of the standard convolution into two operators, which are depthwise and pointwise convolutions. A standard two-dimensional convolutional neural network operation with a kernel K ∈ R 3 with a size of k w × k h × k s can be written as where P represents the separable convolution operation and Q represents the pointwise convolution operator, which can be linked to the standard convolution in the form of = I align (I rotate (I segment (X))) The first step goal, which is a masked segmentation, is to extract the hand region from the background. The output of I segment function X 1 is an element-wise multiplication between the input image X and the masked label X mask that was generated from the segmentation network. This operation will remove all background objects that include identification plates, medical tubes and equipment, noises, supporting tools, etc. from affecting the bone age prediction accuracy. DeepLab V3 plus [16] is used as the segmentation network, which uses Xception-65 [18] as its backbone. The full segmentation architecture follows the encoder-decoder format, where the encoder network comprises of Xception-65 and atrous spatial pyramid pooling (ASPP) module. On the other hand, the decoder network utilizes two up-sampling steps through a composite operation of separable convolutional layer and bilinear interpolation. The ASPP module consists of three parallel down-pooling branches with three different dilation rates, D = 6, 12, 18 and a residual network as used in [39]. This network requires a total number of 25,574,952 parameters that is trained using Adam optimizer [40] with a fixed learning rate of 0.001. Note that we have used the original validation dataset in [22] to train the segmentation network. Some of the output samples of the masked operation are shown in Figure 4. Because the segmentation goal is to divide each image into two classes of background and foreground, a binary cross-entropy loss function is used as the objective function to optimize the segmentation accuracy. Let us define the predicted label from the segmentation network as lb predict segment and the ground truth label as lb gt segment , then the loss function L segment is formulated, as follows. Subsequently, the second part of the image registration is to rotate the masked image X 1 into a standardized upright position of X 2 while using a rotation operation based on angle θ extracted from the tip of the middle finger and the bottom middle point of the carpal region, as shown in Figure 5a. The goal of this rotation operation is to align the image, such that the segmented hand region will be positioned in an upright form, where the middle finger is in parallel with the vertical axis. Only two keypoints will be utilized for the angle calculation, while all four keypoints will be utilized for the later positional alignment. The positional alignment is performed using rigid translation transformation, so that any sensitive part that is crucial for bone age prediction will not be affected. The four keypoints are the tips of the thumb (KP 1 ), middle (KP 1 ), and pinky (KP 1 ) fingers, as well as the bottom middle point (KP 4 ) of the carpal region. Each KP i ∈ R 2 of point i represents the coordinate information in the form of (m, n). The angle is directly calculated, as follows The four keypoints are automatically detected while using a separable convolutional neural networks regressor. MobileNet V1 architecture [17] has been chosen to be the regressor by replacing the last softmax activation function with a linear activation function. The network output will be in the form of a vector with eight elements, which represent the four coordinates of the points of interest. It is chosen because of its lightweight design that produces good regression performance at a relatively low computational burden. This selection is in line with the whole design of this work, where a separable convolution scheme is used in both of the steps in the image registration module, while the final age prediction network also employs this scheme extensively. This network utilizes a set of 13 layers of separable convolution that takes input images size of 224 × 224 pixels. The regressor will be trained using mean squared error loss function, where KP predict is the predicted output coordinate from the network, while KP gt is the ground truth coordinate. Similar to the previous segmentation module, the regressor network is trained while using the original validation dataset from Halabi et al. [22]. Hence, the loss function for the keypoints regressor network can be written as Once again, an Adam optimizer with a fixed learning rate of 0.001 is used to train the network for 500 epochs. The resultant X 2 images will be passed through to a positional alignment process, so that the position of the hand will be mostly located in the middle region, without skewing towards the right or left side of the image. The alignment is done to ensure that all of the keypoints are symmetrically placed, where the horizontal alignment is done by pivoting the middle points between KP 1 and KP 3 , such that it will overlap with the middle point of the image. Likewise, the vertical alignment is done by pivoting the middle point between KP 2 and KP − 4, so that it will overlap with the middle point of the image. Zero paddings will be added to the image so that the registered image will have the same size as the input image. Figure 6 shows some of the output samples fromX that have been transformed through the image registration procedures. Figure 6. Some output samples of the X-ray image that have passed through the image registration process. The first-row samples illustrate the original input images, while the second-row samples illustrate the image that has been transformed.

Xception-41 Regressor
In this section, the bone age will be predicted by a deep learning regressor while using a set of X-ray input imagesX that has been pre-processed into a standardized form. The chosen network will be based on the Xception architecture [18] of the 41 CNN layers version. The primary goal of this work is to design a network that can generalize the bone age prediction by deploying a large number of training data. Because the total number of training, validation, and testing data is large in number, which is 12,811 images, a deep network is suitable to be implemented without worrying much about the overfitting problem. Therefore, a residual network scheme is crucial in overcoming the challenge of diminishing gradient while training a deep network. Furthermore, a deep network will use a large number of parameters if a standard convolution is used and, hence, an alternative of separable convolution scheme will be implemented in order to reduce memory consumption. Given the two criteria, Xception-41 architecture fits these needs perfectly, where it extensively applies residual separable convolution scheme through eight repeating modules.
The separable convolution is a factorized version of the standard convolution using a composite function of depthwise and pointwise convolutions. The bone age will be regressed by using the RSNA dataset that is divided according to the ratio of 8:1:1 for training, validation, and testing, respectively. The network will be trained using Adam optimizer with variable learning rates, LR using the following piecewise function that depends on the epoch value (ep), where its value is fixed for the first 100 epochs, and it will then linearly decrease for another 50 epochs.
where δ LR ∈ R 1 is the reduction gradient of the learning rate. The full Xception architecture consists of three modules, which are entry (ψ 1 ), middle (ψ 2 ), and exit modules (ψ 3 ) as shown in Figure 7. Let the regressor output, R BA ∈ R 1 be formulated, as follows The entry module ψ 1 takes an input image of 299 × 299 pixels, where it comprises two layers of standard convolution and three layers of separable convolution. Carrying over from the previous subsection definitions, a single-layer residual convolution can be formulated aṡ Therefore, a two-layer residual convolution can be written as V(m, n, o out ) =V(V(m, n, o in )) + X(m, n, o in ) Using the combination of V andV, the output feature mapsX of ψ 1 can be represented bŷ where M is a maximum pooling operator. The output feature mapsX ψ 1 will be reduced to 1 4 of the originalX size. Taking over from the entry module, the middle module consists of eight repetitive three-layer residual convolution networks. Using similar convention, as in Equation (13), the network can be formulated, as follows The residual convolution network ... V for the middle module will be using the same number of filters for all three separable convolutions, which is contrary to the popular approach in [41], where a bottleneck scheme is used with the middle layer will have the most number of filters. The residual or skip connection will maintain the same size as its input, which will be combined with the main branch using the addition operator, as shown in Figure 8. Thus, the output feature mapsX ψ 2 after processing the middle module ψ 2 is as followsX Finally, the last exit module ψ 3 will predict the bone age using a linear activation function. Global average pooling operator G, which is introduced in [42] is used to reduce the feature maps size to 1 × 1 × 2048. These 2048 feature vectors are then passed to a fully connected layer N , where the output directly represents the predicted bone age R BA . The exit module is then formulated, as follows

Results and Discussion
The regressor is trained for 150 epochs using an additional augmented dataset, which was produced using Keras generator. The synthetic data are generated by varying the zoom ranges, rotation angles, and translational positions, as well as flipping and shearing process. Table 1 shows the hyper-parameter setup of the regressor. Two performance metrics are used to validate the experimental test on 1281 images, which are mean absolute error (MAE) and mean squared error (MSE).
where T test is the total number of test images, while lb gt regress is the annotated ground truth bone age by the pediatricians and lb predict regress is the predicted bone age by the regressor network. For performance comparison, 10 other state-of-the-art deep learning networks have been tested while using the same image registration technique that includes MobileNet V1 [17], MobileNet V2 [43], MobileNet V3 [44], ShuffleNet V1 [45] and ShuffleNet V2 [46], VGG-19 [47], Inception V3 [48], ResNet [49], and DenseNet [42]. Basically, six of these models can be regarded as a lightweight model because of the total parameters used are less than 10 million. All of the networks have been trained until convergence, as shown in the training graph of Figure 9. Note that the training graphs have been capped at 40 epochs for better visualization. The proposed method has obtained the lowest mean squared error at the end of the 40 epochs, as shown by the red line graph.   Table 2 shows the experimental results of the bone age prediction of the proposed method and its benchmarked networks. Basically, our method which is based on the Xception regressor has produced the lowest MAE of 8.200 months and MSE of 121.902 using a total number of 20,863,529 parameters. The main advantage of the Xception network is its three-layer residual separable convolution unit with 728 filters. Besides, when a down-pooling operator is applied, the skip connection layer will also be down-pooled using a convolution operator and, thus, reducing the probability of diminishing gradient problem. Moreover, this network also applied a unique composite function of the rectified linear unit, convolution layer, and batch normalization operator, while the standard flow for the other networks is convolution layer, rectified linear unit, and batch normalization operator. The second best performance is produced by the Inception V3 model, followed by ResNet and DenseNet with MAE values of 9.774, 10.283, and 10.557, respectively. Note that the proposed method prediction is around 1.5 months better when compared to the second best method, Inception V3. However, the prediction performance gap between Inception V3 with ResNet is just around 0.5 months. Thus, our method has produced relatively significant improvement as compared to the other networks. Besides, ResNet has produced a better MAE compared to the DenseNet, but its MSE is significantly big as compared to the DenseNet. This is because ResNet generally returns good bone age prediction, but it also returns a few bad predictions that are quite far when compared to the annotated ground truth age. Hence, its squared error is big as this metric punishes big error heavily as compared to the MAE metric.
It is also interesting to note the top four methods are produced by networks with a total number of parameters that are bigger than 10,000,000. The best performed lightweight model is MobileNet V1, with MAE equal to 10.886 and MSE equal to 190.349. Moreover, MobileNet V1 returns the best performance compared to the other MobileNet family architectures. There is a trend among the MobileNet family models where a bigger number of parameters used in the network will produce a better prediction performance. Similarly, the performance of ShuffleNet V2 is also better when compared to ShuffleNet V1, where it uses more parameters, which is 5,380,761 as compared to the ShuffleNet V1, with just 936,457 parameters. However, the convolution architecture also plays an important role in determining the network performance, where SqueezeNet with expand and squeeze scheme has produced a lower MAE when compared to ShuffleNet V1, even though it uses less memory of just 735,939 parameters. Nevertheless, the lightweight model performance is generally lower as compared to the standard deep learning models, except for the VGG-19 network. In fact, it is the biggest model with 38,911,041 parameters, but its architecture consists of just a single network flow without applying any residual or branching operations. Thus, it cannot learn well the regression features and produces relatively weak predictions with MAE of just 14.028. The best performed model without applying a residual scheme is Inception V3, but it utilizes a wide network architecture with four parallel branches of different convolution sizes.

Conclusions
In conclusion, the proposed method has produced the best bone age prediction with the lowest MAE and MSE of 8.200 and 121.902, respectively. Therefore, the prediction error of the bone age is well below the 12 months threshold, where prediction tolerance under a one-year gap is still considered acceptable. The algorithm utilizes a novel image registration technique using separable convolution to automatically segment the hand and locate the keypoints for image rotation correction. Moreover, the regressor network, which is used to predict the bone age has utilized three-layer residual separable convolution units to produce a deep network, but maintain an acceptable model size, which is around 20,000,000 parameters. The network has also been trained using variable learning rates where its value is linearly decreasing with respect to the training epoch. However, the proposed method does not consider gender role in making the prediction, where this factor will surely affect the prediction accuracy. Another limitation that has not been considered in this work is the relative size of the captured X-ray image, where some samples are small in size when compared to the others. For future work, the proposed network can be further improved by adding a wider set of convolution filters instead of just a single network flow in the main branch. Atrous convolution can also be considered to capture larger feature maps while using a small convolution kernel size.

Conflicts of Interest:
The author declares no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Ethical Statement: All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of National Institutes of Health, USA (U24CA180927).

Abbreviations
The following abbreviations are used in this manuscript:

G&P
Greulich and Pyle TW Tanner