Modified Maximum Entropy Method and Estimating the AIF via DCE-MRI Data Analysis

Background: For the kinetic models used in contrast-based medical imaging, the assignment of the arterial input function named AIF is essential for the estimation of the physiological parameters of the tissue via solving an optimization problem. Objective: In the current study, we estimate the AIF relayed on the modified maximum entropy method. The effectiveness of several numerical methods to determine kinetic parameters and the AIF is evaluated—in situations where enough information about the AIF is not available. The purpose of this study is to identify an appropriate method for estimating this function. Materials and Methods: The modified algorithm is a mixture of the maximum entropy approach with an optimization method, named the teaching-learning method. In here, we applied this algorithm in a Bayesian framework to estimate the kinetic parameters when specifying the unique form of the AIF by the maximum entropy method. We assessed the proficiency of the proposed method for assigning the kinetic parameters in the dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI), when determining AIF with some other parameter-estimation methods and a standard fixed AIF method. A previously analyzed dataset consisting of contrast agent concentrations in tissue and plasma was used. Results and Conclusions: We compared the accuracy of the results for the estimated parameters obtained from the MMEM with those of the empirical method, maximum likelihood method, moment matching (“method of moments”), the least-square method, the modified maximum likelihood approach, and our previous work. Since the current algorithm does not have the problem of starting point in the parameter estimation phase, it could find the best and nearest model to the empirical model of data, and therefore, the results indicated the Weibull distribution as an appropriate and robust AIF and also illustrated the power and effectiveness of the proposed method to estimate the kinetic parameters.


Introduction
Determining the probability density function of a random variable based on observations is a major and old issue in statistics. In recent years, various parametric and non-parametric methods have been introduced for the estimation of the probability density function for a random variable based on observations, but there is very limited work reported on the optimization methods. The maximum entropy method (MEM) is one of the major methods for estimating and determining the probability density with a high level of accuracy and efficiency and minimum bias. It is applied to gain the unknown density via resolution of an optimization problem. The principle of maximum entropy, as a method of statistical inference, is due to Jaynes [1]. His idea is that this principle leads to the selection of a probability density function that is consistent with our knowledge and introduces no unwarranted information. Any probability density function satisfying the constraints that has smaller entropy will contain more information (less uncertainty), and thus says something stronger than what we are assuming [1][2][3][4]. Entropy maximization or related concepts has been frequently utilized in the past ten years to analyze large biological datasets in various fields. These fields range from determining macromolecular interactions and structures [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20] to inferring signaling [21][22][23][24][25] and regulatory networks [26][27][28] and the coding organization in neural populations [28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43] based on DNA sequence analyzes (the detection of specific binding sites, for instance) [42][43][44][45][46]. MEM is a powerful vehicle to reconstruct images based on various datasets. It is also commonly used in radio astronomical interferometry, which deals routinely with images with high dynamic range and up to a million pixels [47,48].
In this paper, a concise and basic introduction to entropy maximization and its applicability for deriving models from biological datasets-especially in kinetic models and image processing via DCE-MRI-is provided. DCE-MRI is a fast and noninvasive method for quantitative perfusion analysis in soft tissue, using a contrast agent (CA). It is used extensively for the analysis of microvascular blood in a variety of clinical usages like detecting, characterizing, and therapeutic monitoring of different diseases [49][50][51][52][53][54]. In DCE-MRI, quantitative analysis is often applied on a whole tumor region of interest (ROI) [55], in which the contrast agent concentration time curve for all voxels in the tumor is used to estimate a single set of kinetic parameters (like-volume transfer constant between blood plasma and extracellular extravascular space per minute-and-rate constant between extracellular extravascular space and blood plasma per minute [56]) for each patient study.
In recent years, several approaches have been proposed to quantify the perfusion of CA into tissues and to estimate the related parameters of perfusion (indices) from concentration-time or signal-curves [57][58][59][60]. In DCE-MRI, quantification of the perfusion involves measuring the concentration of the CA in tissue over time. This time curve is then modeled using kinetic processes, where the kinetic parameters are of clinical interest [55]. The kinetic model of the tissue is explicit as an ordinary-differential equation, dissolved analytically result in a nonlinear format in the contrast agent concentration [61][62][63][64][65][66][67][68][69].
One fundamental necessity for the Tofts pharmacokinetic analysis is the knowledge of the arterial input function (AIF), that is the time curve of CA concentration on the left-side ventricular blood pool. Although the AIF itself is not clinically relevant, its correct determination is very important for the correct estimation of kinetic parameters [62,70]. Since the obtained rate constants are heavily dependent on the AIF [71][72][73][74][75], an accurate and precise measurement is necessary for their absolute and reliable quantification. Instead, a simplified method, such as a population averaged AIF can be used. However, (large) variabilities in cardiac output-between patients and within patients over time-are no longer taken into account with this method. If this variability in cardiac output can be accounted for by precise measurement of the AIF, the accuracy and repeatability of the kinetic parameters should be superior over use of a population averaged AIF. Some researchers have shown that a population averaged AIF can result in better repeatability [76,77], whereas others report the opposite [78,79]. It is possible that repeatability depends on the imaged body part and imaging sequence parameters, but also on the choice of the artery for AIF measurement.
However, in many imaging applications, e.g., for patients with breast cancer, it is not possible to directly measure the AIF from imaging, as no large vessel is in the field of view. Thus, assumed AIFs from the literature are often used, e.g., bi-exponential functions with parameters derived by [80] or [81] or a mix of the two Gaussian with an exponential [61][62][63][64][65][66][67][68][69]76].
We have previously developed a method for estimating the AIF and the kinetic parameters in DCE-MRI [82]. This method was developed in response to a need in the medical imaging community for the objective comparison of estimations made using different statistical methods, for example, the Bayesian method and MLE. The main problem of our previous algorithm was the dependence of Newton's method on the starting point, which was a uniform random number.
The previous algorithm was the maximum entropy method in combination with Newton's method in order to estimate the AIF using the CA time curve data in plasma, also called "blind estimation" of the AIF and the maximum a posterior approach (MAP) to determine the kinetic parameters [82]. In this paper, we propose an improved algorithm for blind AIF estimation using a mixture of the maximum entropy method (MEM) and teaching-learning based optimization in the step of λ's estimation for assessing observer performance in the classification tasks using available information. This circumvents issues with the random start points in the previous algorithm. This approach works on the effect of the influence of a teacher on students. Like other nature-inspired algorithms, TLBO is also a population-based method and uses a population of solutions to proceed to the global solution. The population is considered as a group of learners or a class of learners. The process of TLBO has two parts: the first part consists of the 'Teacher Phase' and the second part consists of the 'Learner Phase'. 'Teacher Phase' means learning from the teacher, and 'Learner Phase' means learning by the interaction between learners [83,84]. The proposed algorithm is, therefore, more robust. It finally proposed the Weibull distribution in between all other selective models as a model for the AIF via MEM approach. We performed extensive studies using empirical data to better understand the performance of our method. In addition, a comparison was conducted among four other different estimation methods in DCE-MRI dataset, and the new recommended method results were compared with the previous work [55,82].
The rest of this paper is structured as follows: Section 2 describes the basic structure of the proposed method for DCE-MRI analysis: the proposed modified maximum entropy method (MMEM), TLBO algorithm, and MAP. Section 3 includes alternative approaches for estimating of the parameters. Section 4 gives an example to show the step-by-step analysis of the dataset. Section 5 contains the application of a complete DCE-MRI study using the proposed method and their evaluations. Section 6 concludes the aim of the present work.

Data Description
As an example data set, we use a previously analyzed breast cancer data set [55]. The data were provided by the Paul Strickland Scanner Center at Mount Vernon Hospital in Northwood, UK. Pre-treatment DCE-MRI scans of twelve patients were available. In each case, 46 images were recorded every 11.9 s after administration of the contrast agent Gadolinium-DTPA. For the calculation of T 1 values, we used a two-point measurement with calibration curves as described in [85,86]. In DCE-MRI, T 1 is the relaxation time, also known as the spin-lattice relaxation time. In addition, it is a measure of how quickly the net magnetization vector recovers and its ground state or it is the time constant for regrowth of longitudinal magnetization. The T 1 values are computed as a ratio of a T 1 -weighted fast low-angle shot (FLASH) image and a proton-density-weighted FLASH image.
To measure contrast agent concentration C t (t), the signal intensity is converted to T 1 relaxation time values using T 1 -weighted images, proton density weighted images and data from calibration phantoms with known T 1 relaxation times [87]. The Gd-DTPA concentration can then be computed via C t (t) = 1 , where T 10 is the T 1 value without contrast, computed as mean value of the first four images, and r 1 = 4.24l/s/mmol is the longitudinal relativity of protons in vivo due to Gd-DTPA. The imaging parameters of the T1-weighted FLASH images were TR = 11 ms, TE = 4.7 ms, α = 35, the parameters of the proton density-weighted image were TR = 350 ms, TE = 4.7 ms, α = 6. Field of view was the same for all scans, 260 × 260 × 8 mm per slice, so voxel dimensions are 1.016 × 1.0168 mm. A scan includes three sequential slices of 256 × 256 voxels and one slice placed in the contralateral breast as control, which we do not use for our analysis. A dose of D = 0.1 mmol per kg body weight of Gd-DTPA was injected after the fourth scan via a power injector with 4 mL/s with a 20 mL saline flush also at 4 mL/s. Figure 1 shows the empirical model of data for two patients C tis−1 , C tis−2 and C p based on the time t using Kernel distribution which actually finds an empirical density function of the sample data (See: Using "KernelDistribution" Objects and "ksdensity" in Matlab). In statistics, kernel density estimation is a non-parametric way to estimate the probability density function of a random variable and it is a smoothing function that determines the shape of the curve used to generate the pdf, and a bandwidth value that controls the smoothness of the resulting density curve [88]. This is the primary model of data C p (t) changing the inverse problem to forward. It is not clear the sample data belongs to which family of distributions, it is the main issue in here we try solving here by the maximum entropy method. Fig.1 shows the empirical model of data for two patients C tis−1 , C tis−2 and C p based on the time t using Kernel distribution which actually finds an empirical density function of the sample data (See: Using "KernelDistribution" Objects and "ksdensity" in Matlab). In statistics, kernel density estimation is a non-parametric way to estimate the probability density function of a random variable and it is a smoothing function that determines the shape of the curve used to generate the pdf, and a bandwidth value that controls the smoothness of the resulting density curve [89]. This is the primary model of data C p (t) changing the inverse problem to forward. It is not clear the sample data belongs to which family of distributions, it is the main issue in here we try solving here by the maximum entropy method.

Theory & Methods
In this study, we estimated different probability density functions for AIF using our proposed algorithm. The accurate estimated model of AIF is the cornstone of the present work, because its correct determination is very important for the correct estimation of kinetic parameters. In the previous study [83], we have estimated the Gamma and Exponential distributions using the maximum entropy method and Newton's approach. Both models were acceptable approximated AIF in comparison to the literature models.
Here, the modified algorithm examines different moments constraints in MEM to build the best probability model fit to data. In addition, Besides TLBO, the various parameter estimation methods then suggest the estimated parameters which help to find more appropriate AIF. Kinetic parameters are estimated in the next step via MAP.

Kinetic Model
The kinetic process in the tissue can be modeled using an ordinary-differential equation, dissolved analytically resulting in a nonlinear model for the contrast agent concentration ( [61], [62], and [63]), [64][65][66][67][68][69]. In this study, we adopted the commonly used pharmacokinetic model [90] that assumes the CA resides in and exchanges between two compartments in the tissue: the vascular space and EES.
Considering the kinetic properties of the contrast agent (CA) in the tissue of interest (C tis ) using DCE-MRI, we apply the differential equation system as follows: Figure 1. Empirical PDF of the contrast agent in plasma (C p (t)) named empirical AIF and in tissue (C tis (t)) for two patients.

Theory and Methods
In this study, we estimated different probability density functions for AIF using our proposed algorithm. The accurate estimated model of AIF is the cornerstone of the present work, because its correct determination is very important for the correct estimation of kinetic parameters. In the previous study [82], we have estimated the gamma and exponential distributions using the maximum entropy method and Newton's approach. Both models were acceptable approximated AIF in comparison to the literature models.
Here, the modified algorithm examines different moments constraints in MEM to build the best probability model fit to data. In addition, besides TLBO, the various parameter estimation methods then suggest the estimated parameters which help to find more appropriate AIF. Kinetic parameters are estimated in the next step via MAP.

Kinetic Model
The kinetic process in the tissue can be modeled using an ordinary-differential equation, dissolved analytically resulting in a nonlinear model for the contrast agent concentration [61][62][63][64][65][66][67][68][69]. In this study, we adopted the commonly used pharmacokinetic model [89] that assumes the CA resides in and exchanges between two compartments in the tissue: the vascular space and EES.
Considering the kinetic properties of the contrast agent (CA) in the tissue of interest (C tis ) using DCE-MRI, we apply the differential equation system as follows: in which C p (t) is the CA concentrations in the vascular blood pool, that is, the arterial input function AIF. Both K 1 and K 2 are the rate constants of the CA exchanges between extravascular-extracellular space (EES) and plasma. Subject to C p (0) = 0, Equation (1) can be solved with the following result This equation has been used to analyze MR data in a number of studies [81]. Murase [90] proposed a different way to solve Equation (1) using discretization: This can be written in matrix form as follows: where matrix A n×2 : A = {A(1), . . . , A(n)} for each row of I = 1, 2, . . . , n : and From the mathematical view, when C tis (t i ) and C p (t i ) are measured, it is possible to use conventional linear least-squares (LLSQ) method to determine K and the trapezoidal rule for the elements of A. Unfortunately, this method measures approximate values for the kinetic parameters. A great number of image processing problems can be presented as inverse problems. In here the linear system of equations which are obtained after the discretization of the integral equations which arises in various tomographic image restoration and reconstruction concerns are considered: Therefore, we write Equation (2) as follows: where y tis (t i ) is the observed tissue concentration at time t i and the measurement uncertainty (noise) which is assumed to be additive, centered, white, Gaussian and independent of K [91]. For that, the estimation procedure of the analysis here includes Bayesian methods, which its advantage is to overcome the integration process involved in estimating the model parameters. Therefore, Bayesian methods can provide exact estimates of the model parameters, not approximating them. In Bayesian statistics, parameters are viewed as random variables. Each parameter involved in a Bayesian model has a distribution attached to it in order to express the uncertainty about its true value. The distribution is named the prior distribution. They represent the prior knowledge about the parameter of interest, which is often obtained from historical data (data-based priors) [92,93].

Maximum a Posterior Approach
Image reconstruction belongs to the class of ill-posed inverse problems of mathematical physics [94]. In 1967, the physicist V. Turchin suggested using the Bayesian method of maximum a posteriori (MAP) for solving inverse ill-posed problems with stochastic data, naming this approach 'statistical regularization' [95]. Bayesian maximum a posteriori (MAP) approaches can be used to solve ill-posed problems as they arise in image reconstruction [96,96,97]. The solution of MAP obviously depends on the priori models. The main challenge of the Bayesian method is how to determine the a priori probability distribution of the studied image and specify its parameters using its data. In here, we assume a form of a priori information named entropy-based prior, which relies on the principle of entropy. Such approaches have been successfully used in the fields of plasma-tomography, X-ray, radio, and gamma-astronomy [93,[96][97][98]. To estimate the kinetic parameters, we consider the general form of Equation (8) in the following as proposed by [92,93]. Estimating the positive-vector x (the pixel-intensities in an object) subject to a vector of measurement y Entropy 2022, 24, 155 6 of 18 (e.g., a degraded-image or the projections of an object) and a linear-transformation A which relates both vectors by where b is uncorrelated noise with normal distribution, and zero-mean. We consider only approximate information about the variance of the noise Σ 2 and general information about the object.
To estimate the unknown vector x, we use a Bayesian approach. Given the probability density functions (pdf) f (x) and f (y|x) and f (y) we obtain the pdf of the conditional distribution of x subject to y using the Bayesian formula [99]: The MAP estimatorx maximizes the posterior pdf f (x|y) obtained by Bayes formula. In the Equation (10), f (y) is independent of x, f (y|x) relates to the noise probability distribution and finally, f (x) is a prior distribution on x.
If we are not able to directly determine f (x) and f (y|x), we can apply the maximum entropy method. For the MEM, knowledge of some constraints on f (x) can be used. Among all probability distributions satisfying these constraints, we select the one which has maximum entropy [92,92,93], see Section 2.3. To determine f (y|x), the noise pdf, we have A possible way to select a priori-distribution, f (x), is to apply the MEM where the general model is in the form of the exponential family. The advantage of applying MEM for finding a priori is, that this method is the most objective, and maximally uncommitted [99].

Maximum Entropy Method
The maximum entropy principle allows one to determine the least biased probability distribution function when the information available is limited by some macroscopic constraints [100]. The MEM determines the randomness of the primary data by the concept of information entropy. It is the mathematical expectation of the uncertainty associated with an outcome in terms of its occurrence probability. It is suggested that the most likely probability distribution should be the one that maximizes the information entropy. Maximizing Shannon's entropy is the basic idea of MEM [4]: subject to known constraint, typically moment constraints Here φ 0 (x) = 1, and φ k (x), k = 0, . . . , N are N + 1 known-functions. These could be, for example, x n ,log(x), x log(x) or trigonometric or geometric functions. The idea is to assign the appropriate known-function relating the ME distribution to the exponential family via the mentioned constraints [2,3,101]. Using the method of Lagrange multipliers method, where the objective function is Shannon's entropy Equation (11), J( f ) is as follows: For obtaining f (x), we differentiate J subject to f (x): Adopting Taylor's theorem (based on using a Taylor series approximation), the required expected values µ 1 , . . . , µ m can be obtained numerically from the data set [101]. Applying an optimization method with Shannon's entropy as the objective function, and setting Equation (14) equal to zero, the general form of ME distribution will be [4]: where λ k can be selected such that, f (x) in Equation (15) satisfies the known constraints in Equation (12). The parameters λ = [λ 0 , . . . , λ N ] are determined to fit an appropriate class of the ME distributions. To determine the N + 1 unknown-parameters, the set of N + 1 nonlinear equations are solved as follows (1 ≤ k ≤ m):

Teaching-Learning Based Optimization
To solve Equation (13) we suppose a population base algorithm named teachinglearning based optimization (TLBO), which resolves the problem of random starting points. Instead, it can measure the mean of all possible parameter estimations in order to fit a better model to data. The idea of this algorithm is to assume the relationships between a teacher and all learners in a classroom [83]. It considers the same probability for learners to get information from others. TLBO has two steps to find the best solutions, which named teaching and, learning. The most important features of this algorithm is the easy implementation, and rapid convergence [102].
It considered the population as a group of students and their related subjects are the design variables of the problem. In TLBO, the different presented courses for students are supposed to be different variables and their scores are like the objective function. It is very important for teachers to share their knowledge between students to improve the class's level of knowledge, then it can cause achieving acceptable scores by students based on their talents. The teacher is assumed to be the most informed person in the classroom, which distributes her/his knowledge to the students. For that, the teacher will be the best solution (the best person in the population) among all. There is a fact that the student's level of knowledge highly depends on the teacher's quality of the teaching, and the quality of the others in the class. Therefore, the two main steps of learning for the students are implemented After the creation of the initial population and calculation of the objective value for each individual as follows:

Teacher Phase
In this step, the teacher tries to improve the mean scores of the students subject to her/his situation. The random procedure here is to produce a new solution instead of the old one: where, D shows the number of courses, X old,D (a vector 1 × D) is the old solution, when there is no contribution between the students to improve their knowledge, and it includes the results of each specific course, a random number r is in [0, 1], X teacher,D is the best solution of the whole population, T f is a teaching factor changes from 1 to 2 randomly with the same probability, and M D is a vector ( 1 × D) involving the mean values of the classroom results for each specific course. The new solution X new,D is accepted if it is better than the old one [102].

Learner Phase
To improve the knowledge of each student when randomly cooperating with other students, Equation (18) is implemented to all of them, so that, one can achieve the new information in the situation when the other learner has more knowledge than her/him.
where i = 1, 2 is the total number of solutions, X old,i when there is no cooperation with other students, r i is a random number in the range [0, 1], and X j and X k are two learners randomly chosen with j = k and in which X j presents a better objective value than X k . The solution X new,i is accepted if it is better than the old solution X old,i .

Implementation
We have implemented the code for the proposed method in MATLAB. These are the main steps of the algorithm: (1) Determining φ k (x) and their numerical expectations using dataset via Taylor's theorem [4], (2) Using TLBO (or an alternative optimization method, see below) to determine the unknown function with the Shannon's entropy as target function. The general form is given in Equation (15), ( f C tis (t) and f C p (t)), (3) Applying the proposed method to find λ k in which f (x) (Equation (15)) matches the constraints in (Equation (12)), (f C tis (t) andf C p (t)), (4) Estimating the kinetic parameters K, we replacef C tis (t) andf C p (t) in Equation (8) and resolve them via MAP, (5) Using the Kullback-Leibler divergence D K−L ( f ||g) to check the accuracy of the estimated AIF,f C p (t) in comparison with the empirical distribution of dataset g(C p ),

Weibull Distribution
The Weibull distribution is widely used in reliability and life data analysis due to its versatility. It is also established a close approximation to the probability laws of various natural-phenomena. The pdf of the Weibull distribution has two parameters: where k is the shape and c is the scale parameter [103,104]. Using the Weibull model, we can also use different approaches to determine the parameters k and c:

Methods of Moments
The method of moments (MM) calculates the first and second moments to estimate shape and scale parameters. The algorithm relies on the mean, variance of the gammafunction of (1 + 1/k) [105]. The sample mean and standard error arē where Γ(x) = ∞ 0 t x−1 e −t dt is the gamma function.

Empirical Measurement Method
The empirical measurement method is a special case of MM [105][106][107]: where σ is the sample standard deviation.

Maximum Likelihood Method
For the maximum likelihood estimator (MLE), an iterative algorithm can be used. With n the number of non-zero data points, the shape and scale parameters (k, c) are by iteratively solving

Modified Maximum Likelihood Method
When the data is available in the form of the frequency distribution, we can apply the modified maximum likelihood method (MMLE). (26) where P(x i ) represents the frequency of data x i , n the number non-zero data points, and P(x ≥ 0) the probability of the random variable equal or exceeding zero. In Equation (26), k can be resolved iteratively, then c can be solved explicitly [108,109].

Non-Linear Least Squares Method
For the non-linear least squares method (NLSM) the observations are ordered in an ascendant form, and coupled to the failure probabilities, gained by the estimators. The Gauss-Newton's algorithm is used to gain the best-fitted curve of a Weibull model [110].
Based on the individual results by the method of maximum likelihood, modified maximum likelihood, and least-squares regression, Seguro and Lambert [111] concluded that maximum likelihood or modified maximum likelihood proposed here provided more reasonable and accurate values for parameter estimation of Weibull distribution than least-squares regression. Later it is proven once again by Cook [112].

Example of Application
In this section, we show the application of the modified maximum entropy method (MMEM). Due to the limitation of length, this paper only provides a brief description of formulas and fitting curves.
An additional challenge is the fact that usually there are an infinite number of statistical models that are consistent with a given set of global properties measured from data. Therefore, one needs an additional criterion to decide which model to use. The benefit of using the maximum entropy method is to find the simplest model with the lowest bias, which maximizes the entropy. For the same dataset, there could be many complicated models to describe data, but finding the best fitted model with a few numbers of constraints, a few steps of computations, and finally, with the known family of distribution, is the advantage of choosing this method. For that, the maximum entropy method, searches, examines and, applies different moment constraints (see Section 2.3) [3], and adopts the minimum number of them to form an appropriate probability density model for the sample data. There could be a large difference between the maximum entropy estimated model with two constraints and three constraints, but for the estimated ME model with four or five cases, it would not be a big difference in the estimated models. In other words, there is no guarantee to estimate the better maximum entropy model when adding more constraints.
After examining several known functions [3] and their exceptions (constraints) which are computed numerically based on Taylor's theorem from the sample data, the estimated probability density function of the data fits well to the Weibull distribution ( Figure 2).
using the general form of the maximum entropy distribution Equation (15),f C p (t) can be as follows: where the final ME multipliers λ s and the Weibull parameters are estimated as followŝ and based on the ME form of Equation (23) in which Then, according to Equations (27) and (28), the Weibull parameters will be c = 1.8498, k = 3 where the mean of absolute error, D K−L divergence and entropy are 0.0470, 0.0438, and 0.2026, respectively, see Figure 2. Table 1 lists the mean absolute error (MAE), Kullback-Leibler distance D K−L and the entropy of different AIF models via MMEM described in Section 2.3 and the empirical one. In each case, we have applied evaluation methods to check the validity of the estimated model. The high measurement of entropy shows the superiority of the Weibull probability density function to fit the data, too. The MMEM can not optimize the different values of k and c by itself (based on the uniqueness of the maximum entropy distribution [4]), but it can be applied on a grid on k. However, using MAE, D K−L and entropy, we achieve different optimal models with the optimal values of their parameters. Additionally, Figure 3 pictures the fit for the different CDFs. density function to fit the data, too. The MMEM can not optimize the different values of k and c by itself (based on the uniqueness of the maximum entropy distribution [4]), but it can be applied on a grid on k. However, using MAE, D K−L and Entropy, we achieve different optimal models with the optimal values of their parameters. Additionally Fig.3 pictures the fit for the different CDFs.   it can be applied on a grid on k. However, using MAE, D K−L and Entropy, we achieve different optimal models with the optimal values of their parameters. Additionally Fig.3 pictures the fit for the different CDFs.      Table 2 lists the estimated Weibull parameters via different estimation methods as mentioned in Section 3. All the estimated models are presented in Figure 4. Table 3 indicates the evaluation measurements for all the mentioned method in Section 3 to investigate how the proposed method works. Among all these models, the MMEM has the best fit to the data. the correct estimation of the kinetic parameters. The K-L divergence measurements range from 0.001 to 0.0637 for all patients.  (Table 4) are more on the same level between patients -compared to the MEM/MAP -which biologically makes sense. Still, the AIF is estimated from the data, which makes the estimation of k more realistic than the estimation using an assumed AIF.   Based on the results of Tables 1-3 MMEM has achieved a much better fit model to the data. Actually, Table 2 shows the parameter estimations of the Weibull distribution via different methods in comparison to those via MMEM to see in which case the estimated model fits well to the data. In (Table 3), we examined the results via the root mean square error (RMSE), the goodness of fit (χ 2 ), determination coefficient (R 2 ) and the adjust determination coefficient (R 2 ) which highlights the MMEM. The proposed MMEM gives the estimation with the lowest absolute error and D K−L divergence with the highest entropy.

Evaluation
To better evaluate the desired method (MMEM), we considered the data of 12 more patients in total. MMEM and MAP were utilized to estimate the AIF and the kinetic parameters, and the results were compared accordingly, (Figures 5 and 6). The difference within empirical AIF and estimated AIF via the MMEM is clear.
Actually, the AIF in the first two minutes is typically estimated higher than the assumed AIF, whereas there would be negligible difference after about two and a half  (Table 4) are more on the same level between patients-compared to the MEM/MAP-which biologically makes sense. Still, the AIF is estimated from the data, which makes the estimation of k more realistic than the estimation using an assumed AIF.

Discussion & Conclusions
The main purpose of this study is to connected an important problem in statistics which is to determine the probability density function of a random variable based on observations to an important problem in image processing which is to determine the AIF in situations where not enough information about the AIF is available and then accurate estimation of the kinetic parameters. In recent years, various parametric and non-parametric methods have been introduced for estimation of the probability density function for a random variable based on observations, but there is very limited work reported on the optimization methods. Therefore, we have introduced a new algorithm which is the combination of MEM/TLBO named MMEM. The maximum entropy method (MEM) is one of the major and strong methods for estimating and determining the probability density with a high level of accuracy and efficiency and minimum bias. The core idea of this approach is to determine the statistical models agree with data. In other words, the MEM provides a method to find the least biased model that is consistent with the data, i.e., the maximally noncommittal with regard to missing information [1][2][3][4]. the correct estimation of the kinetic parameters. The K-L divergence measurements range from 0.001 to 0.0637 for all patients.  (Table 4) are more on the same level between patients -compared to the MEM/MAP -which biologically makes sense. Still, the AIF is estimated from the data, which makes the estimation of k more realistic than the estimation using an assumed AIF.

Discussion and Conclusions
The main purpose of this study is to connect an important problem in statistics which is to determine the probability density function of a random variable based on observations to an important problem in image processing which is to determine the AIF in situations where not enough information about the AIF is available and then accurate estimation of the kinetic parameters. In recent years, various parametric and non-parametric methods have been introduced for estimation of the probability density function for a random variable based on observations, but there is very limited work reported on the optimization methods. Therefore, we have introduced a new algorithm which is the combination of MEM/TLBO named MMEM. The maximum entropy method (MEM) is one of the major and strong methods for estimating and determining the probability density with a high level of accuracy and efficiency and minimum bias. The core idea of this approach is to determine the statistical models agree with data. In other words, the MEM provides a method to find the least biased model that is consistent with the data, i.e., the maximally noncommittal with regard to missing information [1][2][3][4].
A number of calculations and comparisons have been conducted for the estimation of the AIF and the kinetic parameters, respectively, ( Figures 5-7 and Tables 1-4 ). The results have revealed the characteristics of the empirical PDF of AIF as well as the fact that the modified maximum entropy approach performs adequately in fitting an appropriate model by comparing with the Weibull distributions to the AIF with consideration of its accuracy and applicability.
The aim of this work is not exactly to make decisions on the AIF or Cp, but it is very important how exactly the accurate kinetic parameters are determined. These parameters are so significant and are of clinical relevance, for that, clinicians can easily interpret them. Using the MMEM/MAP guarantees to minimize bias in the estimation of the AIF and the kinetic parameters Since the AIF plays an important role in the analysis of DCE-MRI, in cases where the AIF could not be determined in the image, the literature AIF is a standard technique. The proposed method in this study gives an alternative way to assess the input function from the existing data. We have shown that the proposed method allows a good fit of the data and a good estimation of the kinetic parameters.
To further evaluate this method, we propose to apply it on terms of energy efficiency and system complexity.