Robust Estimation of the Chronological Age of Children and Adolescents Using Tooth Geometry Indicators and POD-GP

Determining the chronological age of children or adolescents is becoming an extremely necessary and important issue. Correct age-assessment methods are especially important in the process of international adoption and in the case of immigrants without valid documents confirming their identity. It is well known that traditional, analog methods widely used in clinical evaluation are burdened with a high error rate and are characterized by low accuracy. On the other hand, new digital approaches appear in medicine more and more often, which allow the increase of the accuracy of these estimates, and thus equip doctors with a tool for reliable estimation of the chronological age of children and adolescents. In this study, the work on a fast and effective metamodel is continued. Metamodels have one great advantage over all other analog and quasidigital methods—if they are well trained, a priori, on a representative set of samples, then in the age-assessment phase, results are obtained in a fraction of a second and with little error (reduced to ±7.5 months). In the here-proposed method, the standard deviation for each estimate is additionally obtained, which allows the assessment of the certainty of each result. In this study, 619 pantomographic photos of 619 patients (296 girls and 323 boys) of different ages were used. In the numerical procedure, on the other hand, a metamodel based on the Proper Orthogonal Decomposition (POD) and Gaussian processes (GP) were utilized. The accuracy of the trained model was up to 95%.


Introduction
Metric age estimation is most often used in anthropology and forensics, but also for doctors in planning and evaluating treatment outcomes, for confirming the age of illegal immigrants or children from international adoptions. Incorrect classification can lead to serious consequences, so it is extremely important to develop reliable procedures for determining the metric age. Subjective, analogue techniques used to assess the development of dentition in a young patient, despite their popularity in the clinical assessment, are characterized by low accuracy. Additionally, clear discrepancies are often noticed between the chronological age and the predicted age determined by means of appropriate scientific atlases, charts and/or tables [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. The differences can be significant [1][2][3] as they can reach even 36 months [3]. Another disadvantage of the currently used methods is their time-consuming nature. This is mainly due to the fact that the assessment of the development stage of tooth buds must be accurate on the basis of tables and studies by the doctors themselves.
Dental age assessment methods can be divided into (A) clinical methods, where the time of eruption of particular groups of teeth is compared, and (B) pantomographic methods, where the process of mineralization of tooth buds is assessed. During routine innovative methodology based on carefully selected 21 coefficients describing the proportions of the geometrical dimensions of the selected teeth. Measurements are carried out fully automatically using the specialized tools for the analysis of pantomographic images of the patients' dentition. The use of artificial intelligence in the work allowed to obtain very accurate results.
The present work is an extension of the work presented by Zaborowicz [29]. New elements that are additionally included here are: (a) automatic sensitivity analysis of all 21 indices to see which of them carry more information; (b) the use of data compression techniques based on the Proper Orthogonal Decomposition (POD) [30][31][32][33], which allows not only the reduction of the size of the input vector but also the denoising of the data, thereby reducing the risk of overfitting the model; (c) the use of Gaussian processes (GP) [34,35], which allows results and their uncertainty to be obtained. This model is trained once-for-all and can later be used as a ready-made tool for in situ identification of the dental age of new patients. The only requirement is the correct parameterization of the pantomographic image of a patient from the same ethnic group for which the model was constructed, based on the precollected training data. Moreover, in the process of learning hyperparameters, a sensitivity analysis is performed, thanks to which the metamodel can be further optimized in a fully automated way. The proprietary procedures have been implemented in the Mathworks software-Matlab 2021b [36]. The effectiveness of the proposed algorithms for the amplitude vectors truncated to just 7 values reached a classification accuracy of 95%.
The most important improvement compared to the previous work is the ability to perform fully automatic prioritization of geometric indicators without the need to analyze in more detail which ones are more important and which ones are less. This is particularly important in the case of patients who do not have some of the selected features, which may disturb the operation of the algorithms estimating the child's age. Another equally important improvement is the ability to determine the age of the child and adolescent with equally good precision, but at the same time obtaining information about the uncertainty of the measurement in the form of standard deviation.

Research Material-Pantomographic Photos
The research material consisted of 619 digital pantomographic photos of children and adolescents obtained from the patient base of the University Center for Dentistry and Specialist Medicine in Poznan. As tooth development is most visible between the ages of 4 and 18, 296 photos of girls and 323 photos of boys in this age interval were included in the research group. Photographs showing developmental disorders and abnormalities, such as changes in the face, diseases of the hard tissues of teeth and pulp, systemic diseases or developmental defects of the face and teeth, were excluded from the study. All patients were citizens of Poland. The study did not have the characteristics of a medical experiment, therefore the Bioethics Committee of the Medical University of Poznan agreed to use them in this study.
In the conducted research, a set of 21 indicators was used, estimated by Zaborowicz [37]. These indicators were selected to capture most of information about the condition of teeth of children and adolescents. As pantomographic images are not made on a fixed scale, all indicators are calculated as proportions of individual geometric distances, lengths of selected teeth, etc. Please refer to the latest work by Zaborowicz et al. [29], where all the details and descriptions of the indicators can be found. In this work, in order to avoid repetition, only the most important description of the methodology for calculating empirical data is given. Pantomographic photos taken with the Duerr Dental VistaPano S Ceph apparatus were used for the tests. This camera records digital images in the DI-COM 3.0 format, which is supported by the specialized software DBSWIN [38] used for the analysis of 16-bit grayscale, i.e., images dedicated to medicine, including oncology, ophthalmology, cardiology, surgery and dentistry [38]. To collect all indices, i.e., tooth and bone parameters the free, open-source software ImageJ 1.52a [39] was used. Figures 1 and 2 show sample photos and measurements. lengths of selected teeth, etc. Please refer to the latest work by Zaborowicz et al. [29], where all the details and descriptions of the indicators can be found. In this work, in order to avoid repetition, only the most important description of the methodology for calculating empirical data is given. Pantomographic photos taken with the Duerr Dental VistaPano S Ceph apparatus were used for the tests. This camera records digital images in the DI-COM 3.0 format, which is supported by the specialized software DBSWIN [38] used for the analysis of 16-bit grayscale, i.e., images dedicated to medicine, including oncology, ophthalmology, cardiology, surgery and dentistry [38]. To collect all indices, i.e., tooth and bone parameters the free, open-source software ImageJ 1.52a [39] was used. Figures 1 and 2 show sample photos and measurements.

Proper Orthogonal Decomposition
The following training pairs were used here: target vectors ( = 1, … , ) corresponding to the known age of the patient in months (in total = 619 patients aged between 52 to 214 months) and the training data collected in a matrix × in which the -th column is a vector (snapshot) containing geometrical indexes of the teeth (in total = 21 selected indicators) for each patient . As the differences between individual indicators result only from the variability of the parameters sought within a given range, snapshots are often correlated, i.e., they create almost parallel vectors in their -dimensional space ( = 21). In order to minimize these correlations, the POD method is employed here [40,41]. This method is based on the truncation of the information contained in the snapshot matrix = , … . Such compression eliminates the elements of the input vector that are characterized by the least variability (i.e., the system is not sensitive to their change). The mathematical theory and computational procedures related to POD have their origins dating back to the early 20th century and are applicable in various fields [30][31][32][33]42,43]. lengths of selected teeth, etc. Please refer to the latest work by Zaborowicz et al. [29], where all the details and descriptions of the indicators can be found. In this work, in order to avoid repetition, only the most important description of the methodology for calculating empirical data is given. Pantomographic photos taken with the Duerr Dental VistaPano S Ceph apparatus were used for the tests. This camera records digital images in the DI-COM 3.0 format, which is supported by the specialized software DBSWIN [38] used for the analysis of 16-bit grayscale, i.e., images dedicated to medicine, including oncology, ophthalmology, cardiology, surgery and dentistry [38]. To collect all indices, i.e., tooth and bone parameters the free, open-source software ImageJ 1.52a [39] was used. Figures 1 and 2 show sample photos and measurements.

Proper Orthogonal Decomposition
The following training pairs were used here: target vectors ( = 1, … , ) corresponding to the known age of the patient in months (in total = 619 patients aged between 52 to 214 months) and the training data collected in a matrix × in which the -th column is a vector (snapshot) containing geometrical indexes of the teeth (in total = 21 selected indicators) for each patient . As the differences between individual indicators result only from the variability of the parameters sought within a given range, snapshots are often correlated, i.e., they create almost parallel vectors in their -dimensional space ( = 21). In order to minimize these correlations, the POD method is employed here [40,41]. This method is based on the truncation of the information contained in the snapshot matrix = , … . Such compression eliminates the elements of the input vector that are characterized by the least variability (i.e., the system is not sensitive to their change). The mathematical theory and computational procedures related to POD have their origins dating back to the early 20th century and are applicable in various fields [30][31][32][33]42,43].

Proper Orthogonal Decomposition
The following training pairs were used here: target vectors t i (i = 1, . . . , M) corresponding to the known age of the patient in months (in total M = 619 patients aged between 52 to 214 months) and the training data collected in a matrix U N×M in which the m-th column u m is a vector (snapshot) containing N geometrical indexes of the teeth (in total N = 21 selected indicators) for each patient t m .
As the differences between individual indicators result only from the variability of the parameters sought within a given range, snapshots are often correlated, i.e., they create almost parallel vectors in their N-dimensional space (N = 21). In order to minimize these correlations, the POD method is employed here [40,41]. This method is based on the truncation of the information contained in the snapshot matrix U = [u 1 , . . . u M ]. Such compression eliminates the elements of the input vector that are characterized by the least variability (i.e., the system is not sensitive to their change). The mathematical theory and computational procedures related to POD have their origins dating back to the early 20th century and are applicable in various fields [30][31][32][33]42,43].
The matrix U, as defined above, initially collected from M = 619 patients, is used to construct the symmetric, positive (semi)definite matrix D = UU T . Then, by calculating its eigenvalues λ i and the corresponding eigenvectors Φ i an orthonormal matrix A N×M , consisting of the amplitudes a m of the snapshots u m , can be described by the following relationship: The noticed correlation between vectors with geometric indices of individual patients means that many amplitudes a i in the new basis Φ can be neglected. In the literature, one can find mathematical evidence that the negligibility of such amplitudes can be quantified by the eigenvalues λ i of matrix D, see, e.g., [44]. By arranging eigenvalues in descending order, one can notice a specific threshold below which eigenvalues do not significantly increase the cumulative sum of all eigenvalues. So, by keeping only the few, say N largest eigenvalues, with N N, the approximation U N×M of the snapshot matrix U N×M is obtained by using the truncated basis Ω N×M and the corresponding truncated amplitudes X N×M : In cases where the number of training pairs is very large, this procedure turns out to be computationally demanding and often time-consuming, but it is performed only once as preparatory work to generate the matrices Ω and X. After this work is done, each new snapshot u * with geometric indices of the teeth corresponding to the new patient t * (new because it was not used in the learning process) can now be determined by (2) by its truncated amplitude N-vector x(t * ).

Gaussian Processes
Gaussian processes can be illustrated by a linear regression (LR) model, which consists pf a linear function of the model parameters w and a nonlinear function of the input vector (i.e., truncated amplitude vector x): where ϕ j (x) is a fixed basis functions of the input variables (e.g., polynomial or radial basis functions).
For the M given training patterns (x m , t m ), x m being the input vector and t m the response for m = 1 . . . M, the parameters vector w of the linear model might be solved by the penalized least squares method: where Θ N×M is a design matrix with elements defined as θ m (x n ). Here, the regularization parameter α is called a hyperparameter and can be computed by using a validation set or by maximizing evidence of dataset p(t|α ) with respect to α [45] within Bayesian inference. Gaussian processes according to Bayesian theory are a double representation of a linear model [45], while the kernel function is here a GP covariance function. Therefore, the regression model leads to the decomposition of the target variable y(x * ) which become the prediction for the new input vector x * . Now, taking the conditional distribution p( y|t) as the Gaussian distribution, the mean can be determined by: while the covariance by the following formula: where C N×M is the covariance matrix: where β is the variance of the target distribution and I is an identity matrix. The covariance matrix C(x, x ) identifies vectors x and x closely adjacent in the input space, which then generate strongly correlated values of y(x) and y(x ) in the output space. Any function that will generate a specific non-negative covariance matrix can be used as a function of the covariance for any ordered set of (input) vectors (x 1 , . . . , x M ), e.g., a stationary, nonisotropic squared exponential covariance function k(x, x ): where: ω i controls a different distance measure in each i-th dimension; ν controls the vertical scale of the process; b represents the deviation that controls the vertical parallel shift of the Gaussian process. If ω i is small, it means that it has little effect on the input data, therefore the i-th input data is scaled down. In general, hyperparameters play a very important role because they have a direct relationship to the sensitivity of the model with respect to the input parameters, and thus allow us to measure the importance of the input parameters.
Having defined the covariance function, it is possible to make predictions of the new input vectors. Before that, however, it is necessary to determine the hyperparameters In order to find those parameters, one can search for the most probable set by maximizing the log likelihood function given by the following equation: Using any gradient-based optimization algorithms, such as a first-order batch Levenberg-Marquardt Algorithm (LMA) or Trust Region Algorithm (TRA) [46]. In this study, TRA, which provides fast convergence, was used.

Metamodel-Training and Testing
First, the experimental data set was divided into training and test data. A different number of patient data was used for testing the model, ranging from 1/18 of all photos, that is, 34 random vector patterns, to 17/18 of all photos, which consists of 585 records. The remaining records were used to train the model. In the first stage, all 21 geometric indices without the POD procedure were used in order to determine the sensitivity of the model to individual indexes. In a model based on Gaussian processes, sensitivity is automatically determined in the learning process by scaling the hyperparameters. Next, the POD procedure was included, where the number of elements in the truncated amplitude vectors ranged from 1 to 21 in order to check for which value, the model achieves the highest efficiency.
It is expected that the fewer elements in the input vector x, the worse the estimate becomes due to insufficient information carried by the truncated amplitude vectors. Likewise, if there are more items, the estimation error becomes greater as the data contains more noise. Therefore, the search for the optimal truncation level in POD procedure seems to be extremely important and is an alternative to the sensitivity analysis built into the training algorithm in Gaussian processes.

Results
The result of this research is a new stochastic methodology for determining the chronological age of children aged 4 to 18 (52 to 214 months). A set of 21 tooth and bone parameters were used here. Those indicators were developed on the basis of digital pantomographic images by Zaborowicz [37]. A metamodel based on Gaussian processes was used here with and without compressing the input data with proper orthogonal decomposition. Two error measurements were used here to compare the performance of different models, namely scaled mean absolute error (%): and mean absolute error (months): First, a model was built without using the POD procedure. A full 21-element vector with normalization of each element was used for training. The error of the training set (MAE) ranges from 1 to 7.5 months, while the error of the test set (MAE) from 8.8 to almost 20.8 months (see Figure 3), depending on the ratio of the training set to the test set. chronological age of children aged 4 to 18 (52 to 214 months). A set of 21 tooth a parameters were used here. Those indicators were developed on the basis o pantomographic images by Zaborowicz [37]. A metamodel based on Gaussian p was used here with and without compressing the input data with proper or decomposition. Two error measurements were used here to compare the perform different models, namely scaled mean absolute error (%): and mean absolute error (months): First, a model was built without using the POD procedure. A full 21-eleme with normalization of each element was used for training. The error of the tra (MAE) ranges from 1 to 7.5 months, while the error of the test set (MAE) from most 20.8 months (see Figure 3), depending on the ratio of the training set to the   The result of this research is a new stochastic methodology for determining the chronological age of children aged 4 to 18 (52 to 214 months). A set of 21 tooth and bone parameters were used here. Those indicators were developed on the basis of digital pantomographic images by Zaborowicz [37]. A metamodel based on Gaussian processes was used here with and without compressing the input data with proper orthogonal decomposition. Two error measurements were used here to compare the performance of different models, namely scaled mean absolute error (%): and mean absolute error (months): First, a model was built without using the POD procedure. A full 21-element vector with normalization of each element was used for training. The error of the training set (MAE) ranges from 1 to 7.5 months, while the error of the test set (MAE) from 8.8 to almost 20.8 months (see Figure 3), depending on the ratio of the training set to the test set.   Then, a model was prepared in which the cut-off level of the amplitude vector was changed from 1 to 21 elements. Results in the form of calculated eigenvalues of matrix D, cumulative sum of eigenvalues, the SMAE and MAE values for both the training set and the testing set are shown in the Table 1. The cumulative sum was calculated using the following formula: where N * is the cutoff level and N is the number of all eigenvalues (here N = 21).  Table 1 presents the results of the estimation using models with different levels of truncation of the input vector. The results are intentionally presented separately for the training set and the test set as it is obvious that the training set will achieve a much lower MAE or SMAE value. This is especially important when these results are to be compared with the results obtained with other commercial tools, where the quality of the prediction is often given as the weighted error MAE from two sets. This is an obvious false assumption, because summing these values for large training sets that produce very small error values and a small test set that has a relatively large error, the actual fit of the model to the new patterns is completely distorted. Therefore, in the further part of this work, all results obtained by the proposed models will be presented for the test set (1/18 of all set, never used to train the model). Figure 5 shows the graph of the SMAE error for the training and test set depending on the number of amplitudes at the model input. Int. J. Environ. Res. Public Health 2022, 19, x FOR PEER REVIEW 9 of 14 Figure 5. SAME for testing and training set with respect to number of input amplitudes.
The seventh row is marked in the Table 1-in this cut-off level, the smallest SMAE and MAE error for the testing set was obtained. In the further part, the detailed values of the obtained predictions for this particular case are presented. Figures 6 and 7 show the detailed results obtained with the model in which the amplitude vector was cut to 7 elements.   The seventh row is marked in the Table 1-in this cut-off level, the smallest SMAE and MAE error for the testing set was obtained. In the further part, the detailed values of the obtained predictions for this particular case are presented. Figures 6 and 7 show the detailed results obtained with the model in which the amplitude vector was cut to 7 elements. The seventh row is marked in the Table 1-in this cut-off level, the smallest SMAE and MAE error for the testing set was obtained. In the further part, the detailed values of the obtained predictions for this particular case are presented. Figures 6 and 7 show the detailed results obtained with the model in which the amplitude vector was cut to 7 elements.   The seventh row is marked in the Table 1-in this cut-off level, the smallest SMAE and MAE error for the testing set was obtained. In the further part, the detailed values of the obtained predictions for this particular case are presented. Figures 6 and 7 show the detailed results obtained with the model in which the amplitude vector was cut to 7 elements.   A statistical measure of the quality of the model may be the coefficient of determination-R square-which can be determined using the formula: where: M is number of targets, t i is a target, y i is a model prediction, y is a mean of target vector: Figure 8 presents the R square values for testing and training sets of the two models: (1) with 7 amplitudes; (2) with 17 amplitudes. A statistical measure of the quality of the model may be the coefficient of determination-R square-which can be determined using the formula: where: is number of targets, is a target, is a model prediction, is a mean of target vector:

Discussion
From the preliminary case study presented in Figure 3, it can be concluded that the model presented here is practically independent of the ratio of the selected test set to the training set. Only when the training set is below 10% of the whole set, the results obtained with the use of the model for the test set did not exceed 12 months of the MAE

Discussion
From the preliminary case study presented in Figure 3, it can be concluded that the model presented here is practically independent of the ratio of the selected test set to the training set. Only when the training set is below 10% of the whole set, the results obtained with the use of the model for the test set did not exceed 12 months of the MAE error. The best results for the test set were obtained for the ratio (testing to training set) of 10%. Therefore, with more training elements, the test error decreases; however, the MSE error for the test set to the training set ratio at 10% and 90% was only between 9 and 13 months.
The first significant observation from the conducted research is the fact that the model based on all indicators but without the POD procedure is not the most optimal. It is visible in Figure 4 that the model is sensitive to only a few of the selected indicators-the greatest sensitivity of the model is to changes in the index X08, then X01, X14, X07, less to X06, X05, X12, the others have almost no impact on the sensitivity of the model. This observation is the basis for using POD procedures, which allow the reduction of the input vector by cutting off the least important components (amplitudes). It is worth noting that the first 7 amplitudes contain almost 99.9% of the information (see Table 1). Analyzing the SMAE generated by the model during training, it can be seen that this error systematically decreases with the increase in the number of amplitudes at the model input, but the same error for the testing set remains constant starting from three amplitudes (see Figure 5 and Table 1). This means that when training the model with a data vector containing more amplitudes, the noise is also introduced due to measurement errors, inaccuracies, etc. This can also be seen in Figure 8, where the R squared value increases when the model was trained with 17 amplitudes (with respect to model trained with 7 amplitudes), but the same model for the testing set generates a lower value coefficient of determination.
The results presented in Figures 6 and 7 show that the model, in 30 out of 34 cases, generated results with an error of less than 10% (in 15 cases, with an error less than 5%). This is very promising, however it is puzzling why a slightly better result cannot be achieved. For a deeper analysis, a histogram was built (see Figure 9), which shows how many patients were in specific age groups-the largest number of patients is between 80-150 months. For example, for patients in the range of 120-130 months, statistical data on the variability of all indicators are summarized in Figure 10. It can be clearly seen that some parameters have very low variability (X07-X10). error. The best results for the test set were obtained for the ratio (testing to training set) of 10%. Therefore, with more training elements, the test error decreases; however, the MSE error for the test set to the training set ratio at 10% and 90% was only between 9 and 13 months.
The first significant observation from the conducted research is the fact that the model based on all indicators but without the POD procedure is not the most optimal. It is visible in Figure 4 that the model is sensitive to only a few of the selected indicators-the greatest sensitivity of the model is to changes in the index X08, then X01, X14, X07, less to X06, X05, X12, the others have almost no impact on the sensitivity of the model. This observation is the basis for using POD procedures, which allow the reduction of the input vector by cutting off the least important components (amplitudes). It is worth noting that the first 7 amplitudes contain almost 99.9% of the information (see Table 1). Analyzing the SMAE generated by the model during training, it can be seen that this error systematically decreases with the increase in the number of amplitudes at the model input, but the same error for the testing set remains constant starting from three amplitudes (see Figure 5 and Table 1). This means that when training the model with a data vector containing more amplitudes, the noise is also introduced due to measurement errors, inaccuracies, etc. This can also be seen in Figure 8, where the R squared value increases when the model was trained with 17 amplitudes (with respect to model trained with 7 amplitudes), but the same model for the testing set generates a lower value coefficient of determination.
The results presented in Figures 6 and 7 show that the model, in 30 out of 34 cases, generated results with an error of less than 10% (in 15 cases, with an error less than 5%). This is very promising, however it is puzzling why a slightly better result cannot be achieved. For a deeper analysis, a histogram was built (see Figure 9), which shows how many patients were in specific age groups-the largest number of patients is between 80-150 months. For example, for patients in the range of 120-130 months, statistical data on the variability of all indicators are summarized in Figure 10. It can be clearly seen that some parameters have very low variability (X07-X10). Unfortunately, in other age ranges the situation is completely different-for example, the X02 parameter varies in the range from 0.235 to 464.53, similarly the X04 parameter (0.519-323.68) and the X15 parameter (0.214-43.14). This means that some of the indicators are based on the geometry of the teeth, which, for example, have not yet formed or some proportions have been incorrectly determined. Therefore, a certain increase in the accuracy of the model can be obtained, for example, by checking and, if necessary, Unfortunately, in other age ranges the situation is completely different-for example, the X02 parameter varies in the range from 0.235 to 464.53, similarly the X04 parameter (0.519-323.68) and the X15 parameter (0.214-43.14). This means that some of the indicators are based on the geometry of the teeth, which, for example, have not yet formed or some proportions have been incorrectly determined. Therefore, a certain increase in the accuracy of the model can be obtained, for example, by checking and, if necessary, comparing selected indicators and selecting the geometry based on their impact on the sensitivity of the model. comparing selected indicators and selecting the geometry based on their impact on the sensitivity of the model. In order to fairly compare the results obtained with the use of the model presented in this paper with the results obtained with the use of very popular and widely used neural networks based on deep learning, only the test sets obtained by both models were selected for comparison. Deep learning network results for the same case can be found in our recent paper [47]. Since in the work by Zaborowicz et al. [47] the test set consisted of 25% of the whole set, the model presented in this work was also rebuilt so as to fairly compare the results. Ultimately, the error MAE = 8.8 1.0 month was obtained, while the same error for test set obtained with the use of deep learning neural networks was MAE = 10.0 months. This observation allows to draw a general conclusion that PG-POD models can be successfully used to estimate the age of children and adolescents using large sets and are not inferior to other methods generally used in the modern world.

Conclusions
The proposed metamodel based on Gaussian processes with the input data compacting procedure using the proper orthogonal decomposition gives stable results, with mean absolute error at the level of 7.5 months ( 6.12%). Which is a significant improvement over analytical methods that can provide 12 months at best, and a slight improvement over deep learning-based methods ( 8 months). The use of POD made it possible to significantly reduce the input data-down to 7 amplitudes only-without noticeable loss of information, while maintaining only the necessary information. Once prepared (i.e., trained on collected data-panthomographic images), the model can be used to quickly assess the chronological age of children and adolescents from 4 to 18 years of age. For each new patient, the age assessment procedure is as follows: (a) a pantomographic photo is taken; (b) the geometric indices of teeth and bones are recalculated in accordance with the prepared pattern of indexes; (c) the input vector of 21 indexes is projected onto the prepared orthogonal space; (d) amplitudes are truncated to the selected level-in this case to the first 7 values; (e) the amplitude vectors are introduced into the model that generates the age of the young patient. This allowed for an easy (compared to analogue techniques) and precise assessment (up to 7.5 month) of the dental age of a person whose age cannot or is difficult to be determined.  In order to fairly compare the results obtained with the use of the model presented in this paper with the results obtained with the use of very popular and widely used neural networks based on deep learning, only the test sets obtained by both models were selected for comparison. Deep learning network results for the same case can be found in our recent paper [47]. Since in the work by Zaborowicz et al. [47] the test set consisted of 25% of the whole set, the model presented in this work was also rebuilt so as to fairly compare the results. Ultimately, the error MAE = 8.8 ± 1.0 month was obtained, while the same error for test set obtained with the use of deep learning neural networks was MAE = 10.0 months. This observation allows to draw a general conclusion that PG-POD models can be successfully used to estimate the age of children and adolescents using large sets and are not inferior to other methods generally used in the modern world.

Conclusions
The proposed metamodel based on Gaussian processes with the input data compacting procedure using the proper orthogonal decomposition gives stable results, with mean absolute error at the level of ±7.5 months (±6.12%). Which is a significant improvement over analytical methods that can provide ±12 months at best, and a slight improvement over deep learning-based methods (±8 months). The use of POD made it possible to significantly reduce the input data-down to 7 amplitudes only-without noticeable loss of information, while maintaining only the necessary information. Once prepared (i.e., trained on collected data-panthomographic images), the model can be used to quickly assess the chronological age of children and adolescents from 4 to 18 years of age. For each new patient, the age assessment procedure is as follows: (a) a pantomographic photo is taken; (b) the geometric indices of teeth and bones are recalculated in accordance with the prepared pattern of indexes; (c) the input vector of 21 indexes is projected onto the prepared orthogonal space; (d) amplitudes are truncated to the selected level-in this case to the first 7 values; (e) the amplitude vectors are introduced into the model that generates the age of the young patient. This allowed for an easy (compared to analogue techniques) and precise assessment (up to ±7.5 month) of the dental age of a person whose age cannot or is difficult to be determined.

Institutional Review Board Statement:
The Bioethics Committee of the Medical University of Poznań considered that the research carried out does not have the characteristics of a medical experiment and therefore agreed to carry out the relevant work.