1. Introduction
In recent years, the mathematical description of growth-related phenomena has become a burgeoning field of research. Today, the formal analysis of such phenomena and the prediction of future behavior from past observed data make good use of the so-called growth models. The availability of large datasets and the increasing computational capacity for the analysis and development of new statistical techniques have propelled the traditional field of population dynamics towards the center stage, where new, increasingly sophisticated models are required. Taking growth curves as a basis, researchers work on models which are applied to fields as diverse as cell growth (whether tumoral, stem, or other tissue), the exploitation of energy resources, or infection transmission and the spread of pandemics. In this latter regard, it is obvious that the SARS-CoV-2 virus has further fueled interest in the development and application of new dynamic models.
For the reasons outlined above, the use of growth curves in the study of biological phenomena has become more widespread than ever before. Recent studies have applied growth curves to the description of a variety of phenomena. For example, Ram et al. [
1] employed logistic, exponential, and Gompertz curves to the study of microbial growth in a mixed culture. Spiess et al. [
2,
3] studied the methods used in the analysis of PCR techniques (currently applied to the detection of the new coronavirus), whose observations can be modeled through growth curves. Additionally, and also concerning pandemic dynamics, Rajasekar et al. [
4] obtained some results about a SIRS model with logistic growth. Several fields of study can be formally approached by means of deterministic growth models, such as the analysis of fish growth [
5], which is carried out by considering a new framework related with the von Bertalanffy curve. However, such tools lend themselves not only to the description of phenomena: their behavior in context itself has become the object of study. In this sense, one very successful approach is to extend classic models by introducing functions that modify their original behavior and allow them to reproduce real phenomena of greater complexity. Concerning this approach, the next section makes a brief review of models that have emerged recently thanks to the incorporation of hyperbolic functions to classic models such as the logistic and the Weibull.
Despite the high degree of effectiveness shown by many of these deterministic models, many of them lack the ability to account for random factors affecting growth dynamics. To improve a model’s ability to fit a phenomenon, one must take into consideration the uncertainty generated by factors both internal and external to the process, which are unknown and therefore commonly disregarded. The same applies to errors generated by the observation and measurement of the phenomenon and to the random nature of certain dynamics. For this reason, stochastic models must be introduced which can be built on the basis of said growth curves as a basis. As a matter of fact, including certain noise factors in the ordinary differential equations which generate said curves leads to stochastic differential equations, which have, under certain conditions, diffusion processes as a solution. The literature is extensive in this regard and continues to grow, as exemplified by Hening et al. [
6], who developed a model characterized by a density-dependent growth rate and applies it to structured populations under the conditions of a heterogeneous environment. Additionally, Parsons [
7] considered some stochastic processes and their effects on populations with a logistic behavior and Román-Román and Torres-Ruiz [
8] defined a stochastic model based on a Richards-type growth curve. On the other hand, concerning birth-and-death processes, Asadi et al. [
9] dealt with a generalization of the Gompertz model showing good fit to real data from COVID-19 outbreak. In connection with the growth curves, widely used to model tumor dynamics, other studies such as the one carried out by Ascione et al. [
10] also build a fractional version of stochastic models. Furthermore, and as mentioned above, stochastic models can also be developed by generalizing classic ones (see the next section for some examples linked with the objective of the present paper).
However, to obtain stochastic models derived from growth curves, one must face certain problems concerning inference, as the very nature of said processes commonly prevents the explicit knowledge of certain features such as the transition density functions. Similarly, analytical complexity thwarts the estimation of parameters from mere observation. In this regard, non-homogeneous lognormal diffusion processes allow for many of the characteristics of the process to be known and render the process in general much more manageable. To build these processes from growth curves, the original model must usually be reformulated. The convenience of such an approach is complemented by the possibility of carrying out estimations by means of maximum likelihood procedures. Along these lines, Román-Román et al. [
11] carried out a study regarding the estimation by maximum likelihood in the lognormal process with a generic exogenous factor. This enables the estimation of various processes associated with growth curves, such as multisigmoidal models, to be addressed [
12]. Moreover, maximum-likelihood equations can also be dealt with by means of metaheuristic algorithms, as the complexity of the model may hamper efforts to solve the equations. Several algorithms exist which can be used to optimize the likelihood function, among them Simulated Annealing or Variable Neighbourhood Search, which is applied in [
8] to the case of a Richards-type model. More successful recent algorithms are, for example, the Whale Optimization or the Grey Wolf and Moth Flame optimizers algorithms (see [
13,
14,
15] for more details), all of them based on animal social behavior. In addition, the Clonal Selection algorithm [
16] is based on cell behavior and has been applied, for example, to optimization problems related with electromagnetics [
17]. Another procedure is the Harmony Search algorithm, widely extended to many applications and subject to several extensions (see [
18,
19,
20]).
To a large extent, the procedure used to develop stochastic growth models takes into account the factors stated above. However, the sheer complexity of the initial deterministic model plays a central role in the development and application of any stochastic model. In this regard, growth curves must be used that allow for a better fit to the data without needlessly increasing complexity. The present work deals with building a model based on the T-growth curve, which has been shown to have great potential in modeling tumor growth dynamics in the presence of an immune response. Given its inherent analytical simplicity, this model, which is derived from the classic logistic one, may be successfully combined with equations associated with dynamics involving the effector cells of the immune system.
The structure of the present work is as follows.
Section 2 provides a brief summary of the evolution followed by the study of hyperbolastic and derived models, both deterministic and stochastic.
Section 3 presents the T-growth curve, analyzes its main properties, and introduces a reformulation that enables asymptotic dependence on the initial conditions. Thanks to this, a stochastic model associated with the curve based on the non-homogeneous lognormal diffusion process is introduced, which is the objective of
Section 3. Inference in this process is addressed in
Section 4, which considers the use of five metaheuristic algorithms (“Clonal Selection”, “Harmony Search”, “Grey Wolf Optimizer”, “Moth Flame Optimizer”, and “Whale Optimization Algorithm”) with the goal of optimizing the maximum-likelihood function. In addition, a proposal is made concerning the strategy employed in order to bound the parametric space associated with the process.
Section 5 presents a simulation study in order to compare the performance of the aforementioned metaheuristic algorithms regarding the inference of the new process. Additionally, it shows the validity of the parametric bounding procedure proposed in these pages. Finally,
Section 6 provides an application of the proposed model and methodology to real data.
2. Brief Overview on Hyperbolastic and Associated Models
The origins of the T-type growth model can be traced back to the methodology suggested by Tabatabai et al. [
21] and Bursac et al. [
22] for generating the so-called hyperbolastic models. Their starting concept was using classic growth curves as the basis to develop other, more flexible models which do not require a large increase in the number of parameters. By virtue of this flexibility, such models can be used to represent certain real phenomena which classic models are unable to adequately explain.
Following these premises, in the aforementioned work, the authors introduced Type I, II, and III hyperbolastic curves (usually denoted as H1, H2 and H3) as the solution of certain ordinary differential equations. In essence, curves H1 and H3 are obtained by solving certain generalizations of the logistic and Weibull differential equations, while curve H2 is a non-homogeneous quadratic equation. All such differential equations have in common the presence of hyperbolic functions (or derivatives of them). The resulting models have been applied to several fields, mostly dealing with bio-sanitary phenomena such as cell proliferation. Thus, Eby et al. [
23] used hyperbolastic curves to analyze the growth of solid Ehrlich carcinoma using a variety of treatments. Tabatabai et al. [
24] used them as a model for the study of stem cell proliferation, showing how some of them, specifically H3, improve the results obtained with other models such as Deasy’s [
25]. In other fields, the models have been used to represent wound healing [
26], specifically in the research of variations in the healing speed of different types of wounds at several stages of the healing process. Mixed-effects models based on the H2 curve have also been considered (see [
27]).
Hyperbolastic curves are not the only product of the methodology outlined above. Tabatabai et al. [
28] considered the so-called oscillabolastic model, which is used to explain the behavior of dynamic biomedical data in which an oscillatory behavior is observed. Such model was applied to the study of Ehrlich ascites tumor growth and to the modeling of signal intensity in the gene expression of the Hes1 gene. Another curve of interest is the hypertabastic curve, introduced in Eby and Tabatabai [
29] within a context of survival models. In that work, the authors recommended its use to model stem cell differentiation time and suggested the inclusion in the curve of covariates that enable the study of the influence of various factors. Finally, the main subject of the present work, the T-type growth curve was introduced by Tabatabai et al. [
30] from a modified differential equation of a logistic one. As the authors mentioned, the model shows great versatility in the representation of dynamic growth phenomena, while enabling models of biphasic growth patterns to be built. Such patterns appear, for instance, in the development of tumors that display temporary decreases in their growth.
Although all these models have been shown to be useful in the description and adjustment of phenomena such as those mentioned above, they are in essence deterministic models since they do not take into account the random fluctuations that may exist in the system under study. In light of that, these curves would be well served by associating them with dynamic models with the power to represent such random information. Some such models are those representing stochastic processes and, more specifically, diffusion processes. Such processes arise when white noise is introduced into the ordinary differential equation that produces the curve, giving way to the consideration of stochastic differential equations whose solution are, under certain conditions of regularity, diffusion processes. These processes are advantageously applied to the computation of multiple statistics aiming to describe the dynamics under study, such as the mean, mode or quantile functions. These statistical functions could also be used for prediction purposes. At the same time, this stochastic-type approach makes it possible to address another series of problems associated with the study of these types of dynamic variables. Among them, the determination of temporal variables that represent the moment in which the variable under study verifies a certain condition stands out, such as the moment of inflection or the moment in which a certain threshold is exceeded. These questions can be posed as first-passage time problems for which there is a well-established methodology within the field of diffusion processes, from theoretical and applied perspectives (see [
31,
32,
33], respectively).
Within this context, Barrera et al. [
34,
35] established two different diffusion processes associated with curves H1 and H3. In both cases, they analyzed their properties and addressed the estimation problem. Both processes were used for the determination of fluorescence, as a function of the cycle, in the application of the qPCR technique of molecular biology. On the other hand, Barrera et al. [
36] presented several alternatives for modeling oscillabolastic-type behavior, more specifically a Gaussian-type process and another derived from the inhomogeneous lognormal process.
3. The T-Growth Curve
The T-growth curve is defined as the solution of the ordinary differential equation
with initial condition
. Parameter
is the growth rate and
is the asymptotic value of the quantity
, known as the
carrying capacity. Its solution is (see [
30]),
where
is a parameter depending on the initial values,
Note that Equation (
1) can be viewed as the classic logistic differential equation with a multiplicative factor
. Solving
for
t, it follows that the T-growth model grows faster than the logistic model when
.
As is well known, in this type of sigmoidal increasing curves, the inflection point of the curve determines the time instant of maximum growth velocity. Thus, this point is of special interest in any study about population growth. As a matter of fact, the T-growth model may have different inflection points, which increases the flexibility of the curve and its ability to adapt to different behaviours.
The time
at which inflection occurs verifies the equation
After some algebra, and from Expression (
2), if follows that the value of the curve at the inflection time is
This guarantees that, even if it depends on growth rate , at inflection , the value of the curve is greater than half of its maximum capacity, that is .
Another remarkable time is that at which the curve reaches a particular value. The first time the curve crosses a threshold
T such that
is important in terms of growth analysis. Indeed, such threshold can be expressed as
, where
is a positive number representing
of the total carrying capacity. If such
crossing-time is denoted by
, it means that
. Solving this expression, it follows that
A different approach can be taken by considering the threshold as a quantity
consisting of
times the initial value of the curve. As far as the study of growth phenomena is concerned, this approach is particularly useful in fields such as the analysis of tumor growth, as it can be used to estimate the moment at which the initial volume of the tumor reaches a particular size (expressed in percentages). This expression could be also included in other dynamical systems, e.g., those taking into account the dynamics of the tumor growth under certain treatments. Now, being
the time at which the curve reaches
n times its initial value, then
, and finally
Please note that
, provided that
M is an asymptotic value. Then, it follows that
.
Figure 1 shows the evolution of times
and
for
k when the carrying capacity
, and for
n when the initial volume is
, respectively. The growth rate is
. On the one hand,
and, on the other,
. The effect of
is shown in the scale of the vertical axis. Actually, as the growth rate increases, the range of each time becomes smaller.
4. Stochastic Model
Deterministic models such as the T-growth curve may be useful to predict and describe the behavior of dynamic phenomena, as they take into account known variables and the relations between them. Nevertheless, in real life, factors remain that are unknown to the researcher. In addition, errors or misleading interpretations of the measurements (caused sometimes from the instruments themselves) may arise during the observation phase and the collection of data. Finally, the random nature of many processes and affecting factors may have an influence on the dynamics under study. For all of these reasons, the use of deterministic models in such studies is not advised. Stochastic models, on the contrary, are able to address these issues and provide an effective way to study, analyze, and describe dynamic phenomena such as the ones concerning growth. For this reason, this section proposes a stochastic counterpart to the T-growth model.
There are different ways to construct such model, but many of them result in equations and processes that are, due to their complexity, useless from a theoretical or even practical point of view. Here, a lognormal diffusion process with exogenous factors, based on the T-growth curve, is proposed. This model can be viewed as a nice one, meaning that many useful properties and explicit expressions are satisfied.
The construction of the model relies on the property, satisfied by the reformulated curve (
4),
where
is the real continuous function given in (
6). Such function can be viewed as the time-dependent fertility rate of a Malthusian model. A stochastic version follows from the perturbation of such fertility, by replacing
by
, where
is a white noise with variance
. As a result, the stochastic differential equation
follows, where
is the standard Wiener process, independent on initial value
for every
. The continuity of function
implies the existence and uniqueness of a solution. In this case, such solution
is meant to be a diffusion process for which an explicit formula is given by
Its finite-dimensional distributions, provided
is either a degenerate variable (i.e.,
) or lognormal distributed
are lognormal. That is, for
and time instants
, the random vector
follows an
n-dimensional distribution
, where
are the components of vector
, for
, and matrix
, respectively.
Then, it follows that the conditional distribution is also lognormal, that is, for
and state
y,
This leads to the explicit form of the probability transition density function, allowing future inference procedures such as maximum-likelihood estimation with the exact density and metaheuristic optimization of an objective function derived from such density.
Finally, the theory about lognormal processes allows us to find explicit expressions for many useful functions. Following Román-Román et al. [
11], many of these characteristics can be expressed in a unique way by the function
for a vector
and a generic state
y, being
. Different values of
and
y lead to some interesting characteristics, as summarized in
Table 1.
Please note that, in the case of the mean function of the process, it follows
which is the T-growth curve starting at
.
5. Inference on the Model
The present section addresses the estimation of the parameters by means of maximum-likelihood procedure. In the particular case of the T-growth model, there are only two parameters, namely and , in addition to the diffusion coefficient .
Let us consider the observation of d sample paths at time instants for and . For the sake of expositional clarity, we suppose that , i.e., the first observation time, is the same for all sample paths, that is, for all i. Now, if , we denote as the matrix of observations. Let us denote as the observation of variable at instant . Such observations form a vector . On the other hand, let us consider an initial lognormal distribution, i.e., of parameters and .
To address the maximum-likelihood estimation, and regarding the conditional distribution of the process (and hence the knowledge of its probability transition density function), we deal with the corresponding log-likelihood function:
for any observation set
, where
and, for every
and
, the following grouped variables have been defined:
The optimization of this function can be carried out by different methods. The direct one is to solve the maximum-likelihood equations. To this end, numerical methods might be required. Nevertheless, the convergence conditions imposed by numerical methods such as Newton–Raphson and its derivatives cannot always be guaranteed. As a matter of fact, such methods require calculating (or at least approximating) the inverse of the Hessian matrix. The existence of this inverse cannot always be guaranteed given the dependence of the system of likelihood equations on the observed data. On other occasions, this inverse may be badly conditioned and give rise to unstable solutions. Additionally, these methods require initial solutions whose determination is not always easy to carry out both due to the difficulty in calculating them and because of their optimality to favor the convergence of the numerical method. Another way to solve the maximum-likelihood estimation is based on the use of metaheuristic algorithms. Such algorithms have the ability to find the local and global minima (or maxima) of a function within a bounded region of the parametric space. These boundaries depend on the properties of the parameters and the original model.
5.1. Bounded Parametric Space
Many metaheuristic algorithms, including those used in this work, require a bounded region within the parametric space. The Grey Wolf Optimizer, for example, will not work properly if the wolves are unable to meet each other while moving without limitations. For that reason, an initial region to search maxima or minima could avoid issues related with the convergence of the algorithm.
The stochastic T-growth model has two parameters, named , which do not have, a priori, upper bounds. However, in the context of growth phenomena, some assumptions could lead to these parameters being totally bound.
Indeed, we suppose that there exists at least one inflection point in our model. This is due to the required flexibility of the model and its relation with inflections. If observations could be explained by a curve with no inflections, many other simpler models would be preferable over the T-growth.
5.1.1. Case
By assuming that there exists an inflection at time
, inflection condition (
5) holds for every
and
. Let us then define auxiliary functions
and
. Then (
5) could be expressed as
where
for all
t. After some algebra, it follows that
is monotonically increasing and
. Since
, it follows that the range of
is the interval
. Analogously, it can be shown that
, with
with a maximum at
. It follows that the range of
is
. Given this, we can conclude that
has no solutions when the ranges of
and
do not intersect, that is, when
. Solving for
shows that, if inflections do exist, then
. Furthermore, a more precise upper bound can be obtained by solving (
5) for
. Indeed, for an inflection time
, it follows that
Then, can be viewed as a function of . Then, and , i.e., for high values of the growth rate, and when there exists an inflection, parameter must decrease in order to maintain such flexibility. The study of leads to the existence of a maximum at , where . Then, the maximum value of is . With this, it can be concluded that, for a T-growth model with at least one inflection point, it holds that .
On the other hand, a bounded interval of
can be obtained from the expression of the carrying capacity. Indeed, from the original expression of such asymptotic value
M and its expression in the reformulation, if follows that
Solving for
leads to a function of
,
,
This function verifies that , but from it follows that the growth is slow for lower values of . This implies that, for a positive lower bound of , and even of high orders such as , a bounded interval for follows. Furthermore, being the growth rate in the context of growth phenomena, it would be reasonable to consider . That is, if , for , then .
5.1.2. Case
The methodology described above requires that
in order to get the expression of
and hence the interval of
. However, when
, this approach cannot be accomplished. In such case, provided the relation
between
and the carrying capacity, and where
is the last observed value of the
ith path, we propose that
be bound as follows
Then, solving (
5) at each one of those bounds of
and inflection time
, corresponding bounds for
follow. Such equation might be solved, for example, by applying the bisection method. Even in such case, the quest for an interval where the root (here, a bound of
) lies, may lead to two minimal intervals where the two bounds lie. Another possibility here is to consider a mixture of those two intervals, by taking the minimum and the maximum of the four values as the lower and upper ends, respectively, of the interval of
.
5.2. Metaheuristic Approaches
In this subsection, the metaheuristic algorithms used to estimate the maximum-likelihood parameters of the T-growth model are introduced. The common framework for these procedures comprises a bounded parametric space and an objective function to optimize. The parametric space can be bounded according to the aforementioned methodology. On the other hand, the objective function is obtained from the log-likelihood function (
7) by getting rid of constants and initial-value terms. Indeed, the function to be optimized is
5.2.1. Clonal Selection Algorithm
The “Clonal Selection” algorithm (CLONALG) was introduced by Castro and Zuben [
16], and it is based on the clonal selection principle, which studies the immune response to antigenic stimuli. Related to evolutionary algorithms, clonal selection has shown great performance in engineering, as illustrated in [
17,
38]. The algorithm describes two populations, one of antigens and one of antibodies, and the relations of affinity between them. The basic steps can be summarized as follows:
Initialize random population of antigens.
Determine the affinity to all antibodies.
Select the antibodies showing the highest affinity and clone them according to their antigenic affinities.
Mature generated clones inversely to the antigenic affinity (high affinity, small mutation rate).
Select those with the highest affinity to antigens and replace if it is better.
Check the stopping criteria (maximum number of iterations or good fitness).
5.2.2. Harmony Search Algorithm
The “Harmony Search” algorithm (HS) was introduced by Geem et al. [
18] and improved in many recent works (e.g., [
19]). This algorithm tries to mimic the process of improvisation performed by music players. With fewer mathematical requirements than other heuristic algorithms, it has been successful in its application to several optimization problems (see [
20]). Its steps are as follows:
Initialize a matrix of random values, sorted by the value of the objective function, called “Harmony Memory” (HM).
Improvise a new harmony (generated according to the parameters) after considerations of HM.
Update HM by including the new harmony (according to fitness, and replacing if better) and excluding the worst harmony.
Check the stopping criteria (maximum number of improvisations).
5.2.3. Grey Wolf Optimizer
This algorithm, inspired by nature, was introduced by Mirjalili et al. [
14] and attempts the leadership structure of wolf packs. Some of its results have been successfully applied, and this in turn has produced some interesting modifications (see [
39,
40]). The algorithm is based on the hierarchy of wolf populations, in which individuals are classified as leaders (both male and female) or
, followed by
and
in decreasing order or leadership. The rest of the wolves are tagged as
. The iterations consist of updating the leadership according to fitness. The rest of the wolves move towards their leaders. Its steps can be summarized as:
Initialize the random population of wolves and established the hierarchy according to the fitness coming from the objective function.
Update the positions of wolves according to the position of leaders. This updating is performed in several stages, such as hunting, searching, or attacking the prey.
Replace the leaders according to better fitness.
Check the stopping criteria and return as the optimal value.
5.2.4. Moth Flame Optimizer
The “Moth Flame Optimizer” (MFO) (see [
15]) is a population-based algorithm inspired by the particular navigation method exhibited by moths. These insects fly with a fixed angle with respect to the Moon. This technique, called transverse orientation, allows them to fly long distances efficiently. Some works about the applications and modifications of this highly successful optimizer are found in [
41,
42]. The algorithm considers two populations, one of moths and one of flames (flags). The former act as search agents across the parametric space, whereas the latter represent the matrix of best solutions according to the fitness of the objective function. By this procedure, moths are ensured to never discard the best solution. Its steps are as follows:
Initialize random population of moths and calculate fitness according to objective function. This leads to a flame as the best position so far.
Update position of the moths moving around the flames, which decrease in number with every iteration.
Update flame position according to better solutions.
Check the stopping criteria (maximum number of iterations).
5.2.5. Whale Optimization Algorithm
The “Whale Optimization Algorithm” (WOA) was introduced by Mirjalili and Lewis [
13] and has shown very good results in, among others, the study of combinatorial problems. This algorithm draws from nature to mimic the social behavior of humpback whales. This species has a particular way of hunting called “bubble-net”. It creates bubbles in a circular pattern in order to hunt their prey by driving it closer to the surface of water. The algorithm models such ascending spirals of bubbles in order perform the optimization process. Its basic steps can be summarized as follows:
Initialize a population of random whales and calculate fitness according to the objective function.
Update whale positions randomly or according to the bubble-net technique.
Update the best search agent (whale), which is the one approaching the best fitness.
Check the stopping criteria (maximum number of iterations).
6. Simulation Study
The metaheuristic algorithms described above were applied to simulated data from the T-growth stochastic process in order to study the robustness of the proposed methodology concerning the bounded parametric space. Specifically, 25 sample paths were simulated in the same time interval, , for . The time step was calculated for a sample size n, which takes values 11, 31, and 51. The initial variable here was degenerate at , meaning that . Parameters and were fixed at values 0.25 and 0.1, respectively. Furthermore, diffusion coefficient took values 0.01, 0.05, and 0.1 in order to illustrate the performance of the algorithms in the parametric space under conditions of low as well as high variability. For each algorithm and every combination of parameters, 50 replications were made, and their mean was taken as the final value. Regarding the metaheuristic algorithms, none required parameter tuning, and the default values recommended by their respective authors were used.
The results were determined by considering the relative absolute error between the observed mean and the mean of the simulated process with the estimated values. The expression of such measure is
where
is the value of the observed mean at time
and
is the value of the mean of the estimated process at the same time.
N is the corresponding sample size in every case.
The results are summarized in
Table 2, for all combinations between
and
n. It is shown that all the metaheuristics perform very well despite the sample size or the variability. The parametric space was bounded according to the proposed methodology, and it was the same for all five algorithms.
Estimated values for parameters
,
, and
are shown for each algorithm in the rest of the tables. For each estimation, the corresponding mean relative error is shown, which was calculated according to
where
is the corresponding parameter and
its estimated value.
As displayed in
Table 2, the “Moth Flame Optimizer” algorithm performs very well in all possible scenarios, even when sample size is small and variability is high. The estimated parameters obtained by this metaheuristic are shown in
Table 3, where the maximum MRE is just 0.0067, corresponding to the case
and
. The best estimation obtained with this algorithm corresponds to the case
and
. This result is expected due to the low variability and large sample size. Nevertheless, as illustrated by
Table 2, the best estimation of MFO occurs when
and
. This is explained by measure MRE being affected by the deviation of any parameter, whereas the RAE takes the mean of the deviations with respect to the estimated mean (taking into account the estimation of
).
Besides MFO, the other algorithms also perform well, although they present higher errors in cases of high variability. This is shown in detail in
Table 4,
Table 5,
Table 6 and
Table 7 for the WOA, GWO, CLONALG, and HS algorithms, respectively.
In particular, the aforementioned tables show that, in general, MRE decreases as the number of points n increases, except maybe for the case , where it remains the same or even increases slightly. In any case, values are low and acceptable. The final results for the parameters are essentially the same despite the algorithm employed. Focusing on each parameter, a maximum variation of approximately 0.04 is apparent for values of across all algorithms. This variation occurs when and for (the maximum and minimum estimates of , respectively). Of course, such high variability has an impact on the solutions and appears to be particularly sensitive to variations of n. Nevertheless, MRE decreases across all algorithms for such cases. On the other hand, results for parameter , mainly involved in the flexibility of the curve, show a maximum variation below 0.004. In particular, the cases with and with present the largest differences, although these remain very small.
All of these results show the robustness of the strategy proposed to bound the parametric space. As a matter of fact, and despite their disparate behaviors, all the algorithms were able to find optimal values with low errors. A particular case of interest is that of low sample sizes. This is a very common case in many practical applications, which makes algorithm performance within the bounded parametric space all the more important. The tables mentioned above show very good performance for low sample sizes. Additionally, even when errors naturally decrease as sample size increases in nearly every algorithm, the estimations when remain quite good. Once again, MFO shows lower errors for small sample sizes even when variability is high. Nevertheless, the values yielded by the rest of the algorithms remain good, to an extent that guarantees the robustness of the procedure and the applicability of this model and strategy to the description of growth phenomena in which few data observations are available.
Finally,
Figure 3 shows a comparison between the observed and the estimated means for every algorithm in every case. The conclusions drawn are the same, with variations being somewhat higher in the case
. In particular, simulated mean paths for the results of every algorithm are shown against the sample mean observation. Cases are sorted by increasing values of
from top to bottom and increasing values of
n from left to right. As can be expected, figures in the first row show small differences between the paths. On the contrary, the last row plots show wider variability due to value
. With respect to
n, such differences are less noticeable in the plots, but
illustrates that high values of
n lead to slightly larger distances between paths.
7. Application to Real Data
In this section, the proposed methodology is applied to real data extracted from the work of Song et al. [
43]. The dataset concerns the volume of tumors in nasopharyngeal cancer, a disease with high incidence in Southern China. Despite classic treatments such as radiotherapy, chemotherapy, or immunotherapy, local recurrence and distant metastasis are frequent among patients. One technology, called RNA interference (RNAi), uses post-transcriptive gene silencing and has shown good performance inhibiting tumor growth by silencing one single gene involved in tumor development. In [
43], it is shown that the effect of RNAi in the reduction of the tumor volume is greater when it is applied to multiple involved genes at once than when applied to a single gene. Nasopharyngeal tumor volume is measured for a number of weeks in several groups according to the affected gene. The involved genes are VEGF, C-myc, Survivin, and hTERT. In the work introducing the T-growth curve, Tabatabai et al. [
30] fitted the model to data related to the VEGF gene, showing better results than other classic models such as the logistic or Gompertz curves. In this section, the proposed T-growth stochastic model is fitted to data for all four genes. Following the methodology, parameter estimation was performed considering the five different metaheuristic algorithms with
,
and a time step of 1 (weeks). The results are shown in
Table 8. Given these estimations, five T-growth processes were simulated in order to compare their mean paths with the original data (see
Figure 4). Again,
Table 8 shows that relative absolute errors are low and very similar, with MFO and GWO yielding the best results. In particular, the MFO algorithm boasts great performance, as it did with simulated data. It is worth noting that the process and the proposed methodology are here applied to a small dataset,
. The results are consistent with others in simulations with few data points like
. The T-growth stochastic model could thus be used in this context to describe the evolution of tumor volume for the RNAi technique acting over a variety of genetic origins. Furthermore, in [
30], the T-growth curve was introduced in a system of equations concerning the dynamics of effector and tumor cells. Such proposal opens possibilities for using the T-growth stochastic model within systems of stochastic differential equations describing more complex behaviors in the dynamics of the RNAi technique and tumor volume.
8. Conclusions
In the present work, a new stochastic growth model is proposed. It is based on the T-growth curve, which is an extension of the classic logistic growth curve. This new model has some interesting features which make it a suitable choice to describe growth phenomena without compromising efficiency in the estimation of the parameters. As a matter of fact, the T-growth model only has two parameters, in addition to the diffusion coefficient and a simple analytic expression. The use of the hyperbolic sine inside the exponential function leads to a significant increase in the flexibility of the curve, and the presence of inflections makes the model a good fit for a wide range of growth data.
In many practical applications, parameter estimation is a major issue. In this case, the low number of parameters and simplicity of the analytical formulation make this easier to address, although the estimation may require the use of metaheuristic algorithms.
Parameter estimation is particularly interesting given the model’s possibilities for real-world applications. Maximum likelihood estimation leads to a system of equations that cannot be solved explicitly, and therefore numerical resolution methods are required. The fact that the verification of the applicability hypotheses of common numerical methods cannot always be guaranteed makes it advisable to study alternatives, such as metaheuristic algorithms.
As a general rule, these procedures require a bounded parametric space to work properly, which in the previous pages is obtained by means of a strategy that employs the T-growth model.
The robustness of such strategy was observed after studying the performance of five different well-known metaheuristic algorithms. All such algorithms were able to estimate parameters , , and with low relative absolute errors. Simulations were carried out under different values of parameter and sample size n. The reason these were chosen is that, in practical applications, it must be known whether the process and the estimation are able to perform well with few data values and in conditions of great variability.
The possible limitations of this model become more evident in contexts which require a high number of parameters. However, given its simplicity, its low number of parameters, and the robustness of the methodology developed for bounding the parametric space when it is applied to several metaheuristic algorithms, the T-growth stochastic model appears as a good choice to describe, analyze, and make predictions within many different fields of practical application.
Future research lines concerning this model could rely on the study of asymptotic distributions of the estimators among other statistics properties derived from the diffusion process. In addition, a more practically oriented approach may follow from the study of dynamic statistical variables such as stopping times or first-passage times. Finally, the introduction of the model into dynamic systems (by taking advantage of its simplicity) may lead to more sophisticated ways to analyze and predict complex behaviors.