Measures of Morphological Complexity of Gray Matter on Magnetic Resonance Imaging for Control Age Grouping

Current brain-age prediction methods using magnetic resonance imaging (MRI) attempt to estimate the physiological brain age via some kind of machine learning of chronological brain age data to perform the classification task. Such a predictive approach imposes greater risk of either over-estimate or under-estimate, mainly due to limited training data. A new conceptual framework for more reliable MRI-based brain-age prediction is by systematic brain-age grouping via the implementation of the phylogenetic tree reconstruction and measures of information complexity. Experimental results carried out on a public MRI database suggest the feasibility of the proposed concept.


Introduction
The ability of computer methods that can predict healthy chronological age of the brain based on radiological images, such as magnetic resonance imaging (MRI), has recently led to a new research direction in computational neuroscience [1][2][3][4][5][6][7].This new type of study is important because it holds promise of being able to train computers to identify neurodegenerative disorders at an early onset, where image samples collected for brain diseases are limited for clinical inference [8,9].In other words, the brain imaging data of healthy or control participants can be utilized as models for machine learning, which is then applied to estimate the brain age of a subject under study based on the brain imaging data of the subject.If the predicted brain age is older than its chronological age, then there is some evidence of its accelerated aging that indicates abnormal cognitive impairment [10,11], or traumatic brain injury [12].
In fact, a normal aging process is known as the progression of gradually accumulated pathologies associated with physical decline, cognitive impairment, and brain-volume loss [13,14].It is well perceived that the degenerations of the brain result in fast aging processes, thus causing brain disorders; and healthy brain imaging data can offer useful medical information for early detection and intervention of these neuropathological and cognitive changes using computational methods [10,15,16].For example, Alzheimer's disease (AD), which is not a normal part of brain aging and known as the most common form of dementia that causes problems with memory, thinking and behavior.AD symptoms usually progress slowly and deteriorate over years to the stage at which daily tasks can be severely hindered [17].Because AD shares many aspects of abnormal brain aging, by applying pattern recognition methods, structural MRI-based features have been discovered as promising biomarkers to identify the early setting of mild cognitive impairment to AD based on the matching of similarity between constructed computational models of healthy and pathological brain aging patterns [11,16].
Most computational methods developed for the detection of early stages of neurodegenerative disease are based on the notion of brain age estimation using the MRI of the brain.The mean age prediction accuracy using hidden Markov models was within the mean absolute error of 2.41 years as reported in [3], 4.98 years using relevance vector machine for regression (RVR) [2], 6.3 years using quantitative brain water maps [18], and within the root mean squared error of 6.5 years using the RVR [19].Regarding the use of brain age estimate for the diagnosis of traumatic brain injury, the suggestion of brain injury was reported when the brain age is estimated to be older than its chronological age with mean errors of 4.66 years and 5.97 years for gray matter (GM) and white matter (WM), respectively [7].While the application of computational models for estimating brain ages using structural MRI to quantify the atrophy of the brain is promising for the early diagnosis of brain diseases, different computational methods provide different results with different respective errors, making it difficult to assess the reliability of the age estimation.Furthermore, although the prediction accuracy can be achieved within an average error of about 5 years, the standard deviation can be as large as 10 years [7], setting the physiological brain age inference with large uncertainty for medical decision making and treatment.Another issue for brain age estimation is that it is difficult to obtain sufficient MRI data of control subjects, which match the age of each individual patient and other confounding factors that correlate with both dependent and independent variables [4].Thus, there is a need to develop some alternative computational methodology for a more reliable prediction of the normal or control brain age using structural MRI data.This is the motivation of this paper that aims to explore the concept of brain age grouping in order to provide a more robust way of physiological inference from control chronological brain age models for more reliable diagnoses of brain disorders.
A methodologically sound procedure for brain age grouping is by way of the concept of phylogeny developed in genomics [20].The hypothesis of phylogeny in molecular biology is that if genomes involve by mutations, then the quantitative difference in a nucleotide sequence between a genome pair proportionally indicates how recently those two genomes shared a common ancestor.Two genomes that diverged in the recent past would be expected to have fewer differences than a pair of genomes whose common ancestor is more ancient.The implication is that by comparing three or more genomes, it is possible to infer not just only the similarities but the evolutionary relationships between them [20].Likewise, chronological brain ages are altered over time and can be divided into separate groups or hybridize together due the complexity of the brain structure.This is analogous to the evolutionary branching process in molecular phylogenetics, which may be depicted with a phylogenetic tree, and the place of each of the various groups of age on the tree is based on a hypothesis about the brain structure in which cerebral atrophy occurred.This hypothesis is also found in comparative linguistics, which is a branch of historical linguistics, where similar concepts are used with respect to relationships between languages [21].Thus, by partitioning brain ages into groups instead of estimating a single chronological age of the brain, we can better reduce the prediction uncertainty, and be able to validate the result by examining the relationships within and between the models built for the brain groups.
To enable the performance of "phylogenetic" analysis for brain age grouping, we need to extract informative features of the brain on MRI so that the dissimilarity or distance matrix between brain groups can be determined.It is known that GM is an important evidence for the assessment of cerebral atrophy revealed by MRI scans of the brain, because GM regions are found to be specifically susceptible to the cognitive decline and AD process [22].Furthermore, the importance of the measurement of structures and changes of the brain during its development, aging, learning, disease and evolution is recognized as an emerging field of neuro-informatics known as brain morphometry, which aims to quantify anatomical features of the brain in terms of shape, mass, and volume using typically MRI data to gain insight into the pattern of the brain and its longitudinal extent across individuals or different biological species [23].
A feasible way for feature extraction is to transform the GM morphology from MRI scans into time series, which was found useful for a study of the complexity of brain folding, cortical surface structure in AD and aging [16].In fact, the human brain is relatively larger and more wrinkled than other species.Brain wrinkles increase its surface, indicating more neurons, therefore more folds of the brain means more surface area and more neurons.Declines in brain function in the elderly are sometimes due to widening of the folds (sulci), resulting in fewer neurons, in comparison with young people [24].The brain outer layer (cerebral cortex) is a central region in the mammalian brain that monitors complex cognitive behaviors [25].Recent scientific evidence has suggested that the size and extent of the folded gray matter of the brain are important factors that influence cognitive abilities and sensorimotor skills [26].Furthermore, the conversion of images into sequences for applications of time-series analysis tools has been utilized for solving several problems in image data mining [27][28][29].Because time-series data of the GM morphology are inherently complex, chaos and nonlinear dynamics analyses of these time-series data are therefore suitable mathematical techniques for extracting their informative statistical properties.Moreover, chaos and nonlinear dynamics have been increasingly reported as effective computational methods for analyzing complex data in medicine and biology [30][31][32][33][34][35][36].We therefore explore several methods of chaos and nonlinear dynamics to extract features from time-series data of the GM spatial structure of the brain on MRI.Such methods include approximate entropy (ApEn) [37], sample entropy (SampEn) [38], regularity dimension (RD) [39], recurrence plots (RP) [40], and the largest Lyapunov exponent (LLE) [41,42].Because dynamic-time warping (DTW) is known as a popular approach for pattern matching in terms of time series [43,44], it is also applied in this study to establish the distance matrix for the tree reconstruction for the proposed brain age grouping, which can be compared with those obtained from the methods of chaos and nonlinear dynamics.
The rest of this paper is organized as follows.Section 2 presents the methods of chaos, nonlinear dynamics, dynamic-time warping, and phylogenetic tree reconstruction, which are implemented in this study.Section 3 describes the database and the preprocessing of the MRI of the brain.Section 4 consists of experimental tests for brain age grouping obtained from several methods, discussion and comparison of the results.Finally, Section 5 is the conclusion of the research, including the summary of new findings, limitation, and issues suggested for future work.

Approximate Entropy and Sample Entropy
The formulation of the approximate entropy (ApEn) [37] has its roots in the correlation dimension [45] and Kolmogorov-Sinai (K-S) entropy [46].However, the correlation dimension and K-S entropy provide procedures for quantifying chaos, while ApEn does not.ApEn only measures how complex or predictable a dynamical system can be.
To assess the dynamical behavior of a time series u = (u 1 , u 2 , . . ., u N ), its dynamics is first reconstructed as a set of vectors of an embedding dimension m: X = (x 1 , x 2 , . . ., x N−m+1 ), where x i = (u i , u i+1 , . . ., u i+m−1 ), i = 1, 2, . . ., N − m + 1. Next, let r be a positive constant that indicates a tolerance for determining the similarity between a pair of the reconstructed vectors.The probability of reconstructed vector x i being similar to x j , j = N − m + 1, is expressed as where θ(d(x i , x j )) is the Heaviside step function defined as The distance measure d(x i , x j ) can be defined as Then, the probability of all the reconstructed vectors being similar to one another can be computed from C i (m, r) by: Finally, with fixed m and r, ApEn is defined as a family of formulas as follows: Given m, r and N, ApEn is defined as a family of statistics: From the above mathematical expressions, C(m + 1, r) will always have a value smaller or equal to C(m, r).Therefore, ApEn(m, r, N) ≥ 0. It can be interpreted that ApEn is the logarithmic likelihood of the reconstructed vectors that are close, and still remain close on the next incremental comparisons.A smaller value of ApEn indicates more self-similarity in data set.The study in [37] suggested that for m = 2 and N = 1000, values of r ranging from 0.1σ to 0.2σ, where σ is the standard deviation of the time series u, would produce reasonable statistical validity of ApEn(m, r, N).
Modifying ApEn, sample entropy (SampEn) [38] does not include self-similar patterns as ApEn does, introducing bias by including self-matching, particularly for a finite value of N. Thus, for given m, r and N, SampEn computes the probability of a reconstructed vector being similar to other reconstructed vectors as Then the probability that two time series match for a dimension m within a tolerance r is calculated by considering only the first N − m reconstructed vectors of dimension m: A family of formula for SampEn is defined as in which the ratio A(m + 1, r)/A(m, r) is interpreted as the conditional probability that two time series within a tolerance r for dimension m remain within r of each other for the next larger dimension [38].
A family of statistics for SampEn is given by Entropy 2015, 17, 8130-8151

Regularity Dimension
The regularity dimension (RD) [39] has been introduced to overcome the difficulty in selecting the two critical parameters m and r of either ApEn or SampEn.Because SampEn is an improved version of ApEn, only the former is discussed in this context.The RD is formulated based on the definition of the power law and SampEn using the general rule of the exponent dimension defined as [47] ∆ ∝ s −D (11) where ∆, ∝, s, and D stand for the number of increments, proportional to, scale size, and exponent dimension, respectively.Rearrangement of Equation ( 11) to obtain the exponent dimension, we have Information, which can be the measure of complexity provided by SampEn, increases with decreasing size of the lag space [42].The lag can be equivalent to the length m defined in SampEn.
In other words, the information is approximately proportional to the log of 1/m (from now on information I is used to refer to SampEn).A straight line of the semi-log axes is therefore expected in a plot of 1/m versus I, and the mathematical relation is a logarithmic equation.Because the relation is a straight line, it has the following form: where I m is SampEn denoting the information subject to m, a is a constant which is the intercept, and D m is the slope of the straight line, which is also the regularity dimension.Rearrangement of Equation ( 13) to solve for D m yields Setting a limit term on m, which indicates that the relation does not hold for very large m, gives When m gets smaller and smaller, 1/m and its log becomes larger and larger.Since a is a constant, the term a log(1/m) approaches zero in the limit of m approaching zero, and therefore this term becomes negligibly small.Thus, Equation ( 15) can be simplified as It can be noted from Equation ( 16) that the regularity dimension D m measures the rate of change of signal regularity/predictability with respect to log(1/m), it is the rate at which the entropy of a dynamical system is gained with increasing resolution (decreasing length m).
Alternatively, increasing the value of r, which is the tolerance of similarity, decreases the information in the sense that no information is gained when most reconstructed vectors are considered to be similar (signal is highly predictable or has low complexity).If r is large enough, then both A(m, r) and A(m + 1, r) become 1, then the value of ApEn is 0 (log(1/1) = 0).Being analogous to the relation of a straight line developed for D m , the regularity dimension D r can be obtained as where I r is ApEn, denoting the information subject to r, a is a constant which is the intercept, and D r is the slope of the straight line, which is the regularity dimension.Rearrange Equation (17), we can define D r as To indicate the relation that does not apply to very large r, a limit term is inserted on r, giving Removing the negligible term a log(1/r) in the limit of r approaching zero gives As from now, it can be seen that either D m expressed in Equation ( 16) or D r expressed in Equation ( 20) can be obtained as the slope of the plot of I m versus log(1/m) or I r versus log(1/r) on the curve which is approximately linear or by least squares fitting of a straight line to the curve, respectively.

Recurrence Plots
A recurrence plot (RP) is used as visual information of an attractor constructed by associated phase-space vectors to study nonlinear dynamics and chaos [40].A plot of numbers of nearest neighbors of a trajectory x i in an embedding dimension versus a set of indices is called an RP, which was introduced in [40] and studied in detail in [48].An RP can be expressed in form of a matrix as [49]: where || • || is a norm, is a threshold distance, and H(•) is the Heaviside step function defined as Thus, the recurrence plot is the visualization of a square recurrence matrix of distance elements within a cutoff threshold.If the distance between two points on the attractor is less than the given threshold, a point is plotted on the matrix of the RP: if R ij = 1 then a black dot is plotted at the location (i, j) of the matrix, which is otherwise plotted with a white dot.Since any trajectory is self-recurrent, that is R ii = 1; therefore an RP is always represented with a black main diagonal that is called the line of identity.An RP is characterized with various visual patterns, which can be helpful for gaining insights into the dynamics underlying the system under study.Its patterns are classified according to topological structures such as homogeneity, periodicity, drift, and transience [46].
These components of an RP describe an individual attractor and can quantify features according to a histogram of the diagonal line in the RP.A diagonal line of length l, where the path of a segment of the trajectory is almost parallel to another segment, is defined as A vertical line of length v is defined as Entropy 2015, 17, 8130-8151 The histogram of the number of length l in the RP is expressed by: The diagonal line, vertical line, and histogram are the basis of the RP for deriving features to quantify the complexity of a dynamical system.This RP-based feature analysis is known as recurrence quantification (RQ).Three commonly used features of the RQ are the recurrence rate (RR), determinism (DET), and recurrence entropy (RE) [50].The RR is a measure of the density of recurrence points in the RP.The DET is the ratio of recurrence points that configure the diagonal lines of at least length l min to all recurrence points in the matrix.The RE is formulated in terms of the Shannon entropy to measure the uncertainty of the RP with respect to the diagonal lines.These three common features are mathematically defined as follows [49]. where

Largest Lyapunov Exponent
One of the most well-known method for quantitative measures of chaos is the largest Lyapunov exponent (LLE) [42,51].The value of the LLE can be negative, zero, or positive.A negative LLE indicates the convergence of two trajectories.The LLE of zero means on the average rate the two trajectories keep at the same distance of each other.A positive LLE implies the divergence of the trajectories, which is an indicator of chaos.In other words, a positive LLE expresses sensitive dependence on initial conditions by presenting the average rate over the whole attractor, at which two nearby trajectories become exponentially separate with evolution.A practical numerical technique for calculating the LLE is the method developed by Rosenstein et al. [41], which works well with small datasets; and is robust to changes in the embedding dimension, reconstruction delay, and noise level.Therefore, this method was adopted in this study to calculate the LLE, and briefly described as follows [52,53].
Given a time series or sequence of length N, the first step is to reconstruct the phase space of the dynamical system using the time-delay method [54].Let m and L be the embedding dimension and time delay.The reconstructed phase space can be expressed in matrix form as where X is matrix of size M × m, M = N − (m − 1)L, and X i = (x i , . . ., x i+(m−1)L ) which is the state of the system at discrete time i.
where |j − j * | > MP where MP is the mean period which is the reciprocal of the mean frequency of the power spectrum.
Entropy 2015, 17, 8130-8151 The basic idea is that the LLE (λ 1 ) for a dynamical system can be defined as [41] where d(t) is the average divergence of two randomly chosen initial conditions at time t, and c is a constant that normalizes the initial separation between neighboring points.By the definition given in Equation ( 32), the j pair of nearest neighbors can be assumed to diverge at a rate measured by λ 1 as follows: where d j (i) is the distance between the j pair of nearest neighbors after i discrete-time steps which is i∆t, ∆t is the sampling period of the time series, and c j is the initial separation between two neighboring points.
Taking the logarithm of both sides of Equation ( 33), giving where 34) gives a set of approximately parallel curves, one for each j (j = 1, . . ., M).If these curves are approximately linear, their slopes represent the LLE (λ 1 ).
The LLE can be computed as the slope of a straight-line fit to the average logarithmic divergence curve defined by where < • > j denotes the average over all values of j.

Dynamic-Time Warping
Dynamic-time warping (DTW) has been well known as a good method for comparing the similarity between two feature-based signals or sequences [44].For DTW, the pairwise comparison does not require that two sequences are of the same length that is a condition for the Euclidean metric.DTW can nonlinearly warps by stretching or compressing the sequences to determine their best match based on certain optimal criteria.DTW can be generally described as follows [53].

Monotonicity property such that
This means a valid path must follow a monotonic order with respect to time.

Continuity condition by setting
. This is also known as step-size constraints.
Given the above conditions imposed for the working of the DTW, an optimal quantification of pattern similarity between two feature vectors r and t is carried out on the basis of a defined local cost function by means of some local distance measure δ(r n , t m ), such that δ : F × F → R ≥ 0. Smaller value of δ(r n , t m ) indicates feature elements r n and t m are more similar to each other.Finally, the optimal warping path is determined as the one whose total induced cost over all possible warping paths is minimum, that is where w * denotes the optimal warping path.
As DTW is based on the principle of dynamic programming (DP) that can provide elegant solutions to optimization problems, DP involves in making sequential decisions that must be made at various stages.Therefore, DP is formulated using the framework of a stage variable, state variable, and decision variables [55].In order to prevent a DP algorithm from searching a large space of stages and states, which can be very computationally expensive, warping paths are usually limited to a search space in terms of a warping window and slope constraint, which are described as follows.
1. Warping window: By setting |n k − m k | ≤ ω, where ω is a positive integer representing the window bandwidth.This restriction means that only features within a warping-path window are considered.2. Slope constraint: By introducing a slope-weighting vector (φ H , φ V , φ D ), where φ H , φ V , and φ D are the weights for the horizontal, vertical, and diagonal directions, respectively.The purpose of this slope constraint is to avoid having a warping path that is either too steep or too shallow, and prevent matching very short segments with very long ones.
Based on the DP formulation, the cumulative distance for each point of a warping path, denoted as γ(n k , m k ), is computed using the following backward recursion: Similarly, the optimal warping path w * can be found by back-tracking the index order, selecting the point with the lowest accumulative distance.Starting with the endpoint w * K = (N, M) that has the smallest accumulative distance, the back-tracking algorithm for finding the optimal warping path is described as follows.

Phylogenetic Tree Reconstruction
Phylogenetic tree reconstruction is a visualization method for studying dissimilarity between genetic data based on the assumption that genomes change or diverge at a slow rate due to accumulated mutations of nucleotides.Thus, the degree of similarity between two genomes can be assessed by quantifying the amount of their base differences: two genomes sharing a recent ancestor would be more similar than those of more distant descendants.The topology of the reconstructed phylogenetic tree can provide insights into relationships between different genomes.In general, a phylogenetic tree shows relationships among species, organisms, or objects for taxonomic purposes of their evolutions.The leaf nodes of the tree structure represent descendants, whereas the internal nodes and root of the tree stand for the common ancestors.Two descendants that belong to the same node or clade are called sister groups, which are in contrast to the outgroup.The tree branches represent the distances between the objects.Thus, by plotting the tree using the feature distances of the MRI-based brain age groups, their relationships can be easily visualized and validated.The notion of phylogenetic tree reconstruction has been described in [9,56] for studying age-related diseases.
Phylogenetic tree can be constructed with the choice of various clustering models.A popular procedure is the use of a similarity matrix based approach, such as the unweighted pair-group Entropy 2015, 17, 8130-8151

Results and Discussion
The sample entropy (SampEn), regularity dimension (RD), recurrence plots (RP), largest Lyapunov exponent (LLE), and dynamic-time warping (DTW) methods were used to obtain featured distances between the brain-age groups from the brain time-series data to construct the corresponding phylogenetic trees.
Using m = 1, 2 and 3, and the tolerance r = 0.2 for computing SampEn values; three corresponding trees were constructed and shown in Figure 2. Table 1 shows  Utilizing SampEn values, with m = 1, 2 and 3, D r values were obtained to establish the trees of relationships among the brain-age groups, which are shown in Figure 3.For each fixed value of m, r was decreased from 0.5 to 0.05 with an interval of 0.05 [16].The slopes of SampEn versus log(1/r) were taken at the first five points, which form a straight line, to calculate D r .Table 2    The trees provided by the RP, with = 5, using its recurrence rate (RR) feature and m = 1, 2 and 3, are shown in Figure 4a-c, respectively.Table 3 shows the means and variances of RR values for m = 1, 2 and 3.While the group of over 80 years old can be well separated from other groups in all the trees; the groups of 20-29 and 40-49 and 50-59 belong to the same clade for m = 1; the groups of 20-29 and 60-69 are located in the same clade, also the groups of 30-39 and 50-59 belong to the same clade in the trees for m = 2 and 3.

Age Group
20-29 (0.0187, 0.0000) (0.0161, 0.0000) (0.0154, 0.0000) 30-39 (0.0192, 0.0000) (0.0167, 0.0000) (0.0159, 0.0000) 40-49 (0.0189, 0.0000) (0.0165, 0.0000) (0.0158, 0.0000) 50-59 (0.0189, 0.0000) (0.0166, 0.0000) (0.0159, 0.0000) 60-69 (0.0181, 0.0000) (0.0158, 0.0000) (0.0152, 0.0000) 70-79 (0.0176, 0.0000) (0.0154, 0.0000) (0.0147, 0.0000) 80-86 (0.0164, 0.0000) (0.0142, 0.0000) (0.0136, 0.0000) Using the recurrence entropy (RE) feature of the RP with m = 1, and l min = 2 and 6, the relationships of the brain-age groups are shown in Figure 4d,e, respectively.The groups of 60-69 and 80-86 are assigned to the same clade in both sub-plots.Using the recurrence-entropy feature of the RP with m = 2 and l min = 2 and 6, the relationships of the brain-age groups are shown in Figure 4f,g, respectively.For l min = 2, the groups of 60-69 and 80-86 are in the same clade.A reasonable result of the tree construction is obtained using l min = 6.Increasing m to 3, the corresponding Figure 4h,i show that the RP-based assignment of the age group of 80-86 is confused with the younger age groups (closer to 60-69 for l min = 2, and closer to 40-49 for l min = 6).Tables 4 and 5 show the means and variances of RE values for l min = 2 and 6, with m = 1, 2 and 3; respectively.On the tree reconstruction using the LLE feature, Figure 5 shows the relationships of the chronological brain-age groups for m = 1, 2 and 3.The time delay L was selected to be 1, and the first five points of the divergence curve were used to calculate the slope of a straight line as the value of the LLE.Table 6  On the use of the DTW for the phylogenetic tree reconstruction of the brain-age groups, the matching costs were calculated by adopting the Itakura and Sakoe-Chiba local constraints [64].Figure 6a-c show the DTW-based trees using the Itakura cost constraints, using the three longest (first, second, and third) time series of the GM boundaries of each age group for pattern matching, respectively.Figure 6d-f show the DTW-based trees using the Sakoe-Chiba cost constraints, using the three longest (first, second, and third) time series of the GM boundaries of each age group for pattern matching, respectively.Figure 6a    Regarding the values given in Tables 1-6, the SampEn means tend to increase with the older-age groups, and become smaller with larger values for m.For m = 1, the means of RD get larger with the older-age groups; but for m = 3, the RD values are smaller for the older ages, except for the cohort of 80-86 years of age; and there is no clear trend of increasing or decreasing of the RD with the brain ages.The RD values are smaller for larger values of m.All RR and RE values are smaller for larger m.Except for 80-86 year-old cohort, the RE decreases with increasing ages of the brain.Except for the 20-29 year-old cohort for m = 1, and 20-29 and 50-59 year-old cohorts for m = 2 and 3, the RR tends to decrease with older age groups.All the means and variances presented in Tables 1-6 are truncated to four decimal digits, which make the RR variances equivalent to zero.Such small values of the RR variances can be due to both the insufficient samples of MRI data and the computational method.
In summary, with the given selection of parameters for each method, the SampEn-based trees, RD-based tree with m = 2, RP-based tree using the RE with m = 2 and l min = 6, and DTW-based tree using the Sakoe-Chiba constraints on the second largest GM-generated time series yield reasonable results to represent the chronological relationships of the healthy brain MRI groups.Among these mentioned reasonable tree topologies, those obtained from SampEn seem all valid for expressing the relationships of the brain age groups, and therefore the SampEn is the most preferred method with the currently available MRI data.A common problem encountered with the computational methods studied here in assigning the group of 80-86 years of age to an appropriate branch of the tree is likely due to the limited data of the age group, which consists of only eight subjects.Other issues are due to data requirements and parameter selections imposed by different methods, which need to be further investigated to ensure their optimal implementations.For example, the LLE is a method for quantifying chaos (a positive LLE indicates chaos), the concept of the LLE is to keep track of two nearby orbits to determine the average logarithmic rate of separation of the orbits.If the two orbits depart too far from each other, one of them has to be adjusted back to the vicinity of the other along the line of separation, and this adjustment is known to be the most difficult and error-prone step for calculating the LLE [65].When underlying equations for chaos are available, the numerical calculation of the LLE is relatively simple; but for experimental data, such a calculation is very difficult [65].In this study, MRI-based time-series data were analyzed using the Rosenstein method [41], which is particularly developed to compute the LLE from small recorded time series and robust to the choice of embedding dimension, reconstruction delay, and noise.In this study, the first five points of the divergence curves were used to estimate the LLEs using the least-squares fit.A practical reason for the poorer presentation of the brain-age relationships via the LLE-based tree topologies than other feature-based tree topologies is due to the inconsistency of the numbers of divergence points representing a straight line for the LLE calculation.Therefore, some method that can adaptively determine suitable numbers of points on the divergence curves for different MRI-based time series would be needed to improve the diagnostic performance of the Rosenstein method.
Finally, the idea is that if a valid tree structure that represents the chronological relationships within a group of brain ages can be constructed, the physiological age of a brain on MRI can be reliably predicted by assigning its physiological age to the age group whose feature-based distance is smallest among others.Furthermore, the constructed brain-age tree can be always updated and extended when more MRI data representing various age groups become available.

Conclusions
Aging is thought as the combined result of physical, biological, chemical, and psychological changes.Evidence from computed tomography imaging studies suggested that the cerebral ventricles dilate as a function of age, which is known as ventriculomegaly [66].More recently, an MRI study found that regional areas of the brain of age-related disorder decreases in cerebral volume [67].However, regional volume reduction is not uniform.Some brain regions shrink, whereas others remain relatively stable until the end of the life-span [14].The structure and function of the brain are known to be very complex.Here, several methods of nonlinear dynamics and chaos, which are well recognized for the analysis of complexity, have been carried out to test their feasibility for an objective measure of the similarity and relationships of the brain-age groups using their MRI data.
Because of limited MRI data, the brain-age gap adopted in this study is 10 years to test the feasibility of the presented methods.Such an age gap can be large for applications in clinical settings.When new samples become available, the tree of brain-age groups can be reconstructed with smaller age gaps, or included in the existing tree model, whose new topology will be reassessed for validity of the relationships before putting the model to a practical use.Furthermore, not only the physiological MRI-based brain age can be predicted among the brain-age groups, inference about the brain under study from the established trees can also be made to discover a possible pattern of the new data.Such a proposed conceptual framework departs itself from other related studies that attempt to predict the brain age based on training data and classifiers.
Being different from the perspective of pattern classification, the purpose of the proposed methodology for grouping brain ages is to secure the reliability of the brain age prediction by examining relationships of the brain age groups, regardless of the age gaps.Validation of the proposed approach can be obtained by the visual inspection of the tree topology under study.Since there can be several valid tree topologies provided by the same or different methods, a question of interest is to determine which topology of relationships is the best.An answer to this question can be problem-dependent, may rely on the knowledge of the anatomy of the brain and clinical trials.
In general, chaos and nonlinear dynamics methods and their potential combination can be useful tools to extract discriminative features related to the variability of the complexity and dynamics of the brain structure on MRI for grouping brain ages, in which the physiological brain age of a subject with neurodegenerative disease can be detected if it is found older than the chronological age based on the matching of features between the constructed brain-age groups and the individual.There is another issue worth considering.Since the brain has many different areas and types of tissue (GM and WM), the vulnerability of different functions of the GW and WM of the brain to age-induced changes may not be consistent because GM consists of cell bodies in the cortex and subcortical nuclei, whereas WM consists of tightly packed myelinated axons connecting the neurons of the cerebral cortex to each other and with the periphery [68].Only the structural information of the GW is taken into account in this study.The inclusion of both types of the brain matter and extraction of features out of regional areas of the brain would be expected to gain further insight into the relationships of chronological brain ages captured on MRI.

Figure 1 .
Figure 1.Examples of magnetic resonance imaging (MRI) scans (left) and time series of associated GM morphology (right) of a 20 year-old (a), 40 year-old (b), 60 year-old (c), and 80 year-old (d) healthy brain subjects; where each plot of the time series shows pixel-based distances measured from the gray-matter center of the brain (y-axis) to sequential outer boundary points (x-axis).
the means and variances of SampEn values for m = 1, 2 and 3.For m = 1, the tree topology provides a desirable result by partitioning the brain ages into two distinct clades of 20-59 and 60-86 (a clade represents a single branch on the tree to indicate a population having a common ancestor), in which 20-29 and 30-39, 40-49 and 50-59, 60-69 and 70-79 are sister groups.For m = 2, the relationships of the brain ages are well divided into three groups: 20-29 and 30-39, 40-49 and 50-59, and 60-69 and 70-79 and 80-86, where the latter two groups are closer to each other.As for m = 3, 20-29 and 30-39, and 40-49 are closer to 50-59, 60-69 and 70-79, while 80-86 is considered as an outgroup.In general, the three tree topologies shown in Figure 2 display reasonable relationships of the brain ages.
shows the means and variances of D r values for m = 1, 2 and 3.For m = 2, the RD-based trees appropriately assign the relationships of the brain-age groups by separating the clade of 20-29 and 30-29 from the other brain-age groups, and the clade of 40-49 and 50-59 from the clade of 60-69, 70-79, and a single branch for 80-86.The RD-based tree, with m = 1, shows the grouping of the chronological brain ages into well separated clades, except the groups of 70-79 and 40-49 belong to the same clade.For m = 3, the RD-based tree locates the groups of 80-86 and 40-49 to the same clade, while other groups are appropriately partitioned.

Figure 2 .
Figure 2. SampEn-based constructed trees of relationships of MRI-based healthy brain age groups, with m = 1 (a), m = 2 (b), and m = 3 (c); where values on the x-axis of each plot are the corresponding feature-based distances.

Figure 3 .
Figure 3. RD-based constructed trees of relationships of MRI-based healthy brain age groups, with m = 1 (a), m = 2 (b), and m = 3 (c); where values on the x-axis of each plot are the corresponding feature-based distances.

Figure 4 .
Figure 4. Recurrence plot (RP)-based constructed trees of relationships of MRI-based healthy brain age groups, where recurrence rate (RR) used as feature with m = 1 (a), m = 2 (b), and m = 3 (c); recurrence entropy (RE) used as feature with m = 1 and l min = 2 (d), m = 1 and l min = 6 (e), and m = 2 and l min = 2 (f); recurrence entropy (RE) used as feature with m = 2 and l min = 6 (g), m = 3 and l min = 2 (h), and m = 3 and l min = 6 (i).Values on the x-axis of each plot are the corresponding feature-based distances.
shows the means and variances of LLE values for m = 1, 2 and 3.It can be seen from all the three figures that the LLE-based trees tend to distinguish the clade of the age groups of 40-49, 50-59, 60-69, 70-79, and 80-86 from the clade of 20-29 and 30-39.In general, the LLE-based tree topologies show the groups of similar ages are more closely related to each other than they are to the outgroup, except for m = 1, the group of 80-86 is closer to the clade of 40-49 and 50-59 than the clade of 60-69 and 70-79, and for m = 3, 80-86 is slightly closer to 50-59 than the clade of 60-69 and 70-79.
puts the group of 80-86 closer the clade of 20-29 and 40-49, and assigns 30-39 and 50-59 to the same clade.Figure 6b can separate the groups of 60-69, 70-79, and 80-86 from those of 20-29, 30-39, 40-49, 50-59 by assigning these two groups into two distinct clades; however, 20-29 is closer to the clade consisting of 40-49 and 50-50 than 30-39.Figure 6c shows that 30-39 is closer to the clade consisting of 50-59 and 60-69, and 20-29 is closer to 40-49.On the adoption of the Sakoe-Chiba local constraints, Figure 6d assigns 20-29 and 40-49 to the same clade.The topology of the tree shown in Figure 6e represents reasonable relationships between the age groups, and distinguishes the age groups of 20 to 59 from those of 60 to 86.The age group of 20-29 is the outgroup to the others for the DTW-based tree using the Sakoe-Chiba local constraints as can be seen from Figure 6f.

Figure 5 .
Figure 5. LLE-based constructed trees of relationships of MRI-based healthy brain age groups, with m = 1 (a), m = 2 (b), and m = 3 (c); where values on the x-axis of each plot are the corresponding feature-based distances.

Figure 6 .
Figure 6.Dynamic-time warping (DTW)-based constructed tree of relationships of MRI-based healthy brain age groups using Itakura local constraints based on: largest (a), second largest (b), and third largest (c) gray matter (GM)-generated time series of brain MRI; Sakoe-Chiba local constraints based on: largest (d), second largest (e), and third largest (f) GM-generated time series of brain MRI.Values on the x-axis of each plot are the corresponding feature-based distances.