Applicability of Common Algorithms in Species–Area Relationship Model Fitting

: The species–area relationship (SAR) describes a law of species richness changes as the sampling area varies. SAR has been studied for more than 100 years and is of great signiﬁcance in the ﬁelds of biogeography, population ecology, and conservation biology. Accordingly, there are many algorithms available for ﬁtting the SARs, but their applicability is not numerically evaluated yet. Here, we have selected three widely used algorithms, and discuss three aspects of their applicability: the number of iterations, the time consumption, and the sensitivity to the initial parameter setting. Our results showed that, the Gauss–Newton method and the Levenberg–Marquardt method require relatively few iterative steps and take less time. In contrast, the Nelder–Mead method requires relatively more iteration steps and consumes the most time. Regarding the sensitivity of the algorithm to the initial parameters, the Gauss–Newton and the Nelder–Mead methods are more sensitive to the choice of initial parameters, while the Levenberg–Marquardt method is relatively insensitive to the choice of initial parameters. Considering that the Gauss–Newton method and the Levenberg– Marquardt method can only be used to ﬁt smooth SAR models, we concluded that the Levenberg– Marquardt model is the best choice to ﬁt the smooth SARs, while the Nelder–Mead method is the best choice to ﬁt the non-smooth SARs.


Introduction
The species-area relationship (SAR) is one of the most general ecological patterns, depicting the changing process of species richness as the sampling area increases.Notably, the SAR is a fundamental pattern in ecology, containing complex ecological processes closely related to the formation, migration, spread, and extinction of species [1,2].Increasing attention is given to exploration of ways in which to use the SAR for conservation purposes, including selection of biodiversity hotspots, determination of the optimal size and shape of natural areas, and the prediction of species extinction [3,4].
Since the concept of SAR was first proposed, ecologists have developed a significant amount of fundamental theoretical research and practical application of SAR [5][6][7].Note that the first SAR model put forward by Arrhenius was depicted by a power function [8], but many mathematical variants of SAR have been continuously proposed to account for diverse shapes of SAR in practical applications [9,10].Over 30 models have been applied to SARs [11].Although the mathematical and biological mechanisms behind the SAR have been preliminarily studied, the applicability of the SAR model is controversial.There is currently no consensus about the exact form of the species-area relationship, and its shape has remained largely unexplained [12].The best known and most commonly applied curves are the exponential curve and the power curve.However, there is no clear biological foundation to give preference to these particular models [13] and the best-fitting model for a particular species-area curve can only be determined empirically [7].
We can use linear regressions to fit SARs with a linear model, or a linearized version of the curvilinear model, but not when fitting a non-linear SAR model in the arithmetic space.The use of free and open-source R software is currently prevalent in the analysis of ecological data, with the possibility of performing non-linear fits.There is a wide assortment of R packages for non-linear fits with effective methods to generate reliable initial parameters that provide the individual or multiple SAR model fitting functions, such as 'mmsar' and the 'sars' [14,15].However, these packages only use one algorithm for SAR model fitting.The difference in applicability of different algorithms is unknown.Due to the lack of a procedure for effective initial parameters estimation, it is impractical to use these packages to fit SAR models.The packages are problematic because the algorithm may converge, but the resulting model fit can be far from optimal.In this study, we numerically evaluated the applicability of the three classic algorithms in SAR model fitting and provide insight for the choice of algorithms when fitting new SAR models.

Methods
The SAR model fitting can be recognized as a nonlinear least squares (NLS) problem.We take the classic SAR model Power(x, a) = ca Z as an example, where a stands for the area size and x = [c, z] T is the key parameter.Assume that the observed data points are denoted by (a 1 , s 1 ) . . .(a m , s m ), if we assign a value to x then we can obtain s i = Power(x, a i ) + ε i , where s i stands for the observed number of species in the sampling area size of a i , i = 1, 2, . . ., m. ε i is the deviations between the theoretical number of species and the true number of species.As a result, we can simply use the NLS method to fit the SAR model and find the x T , that minimizes the sum of ε i .Mathematically speaking, the NLS method minimizes the function defined as F(x) = 1 2 ∑ i=m i=1 ( f i (x)) 2 with respect to the parameter x, where f i (x) = s i − Power(x, a i ) = s i − ca i Z .The factor 1 2 in the definition of F(x) has no effect on x and it is introduced only for convenience of calculation.
All nonlinear optimization algorithms are an iterative process when fitting a model.The initial parameters, x 0 , are first established and then the algorithm produces a series of x 0 , x 1 , x 2 . . .converging to optimal x T .Although there are many nonlinear optimization algorithms that can fit non-linear functions, this study focuses primarily on three effective classic methods: The Gauss-Newton method, the Levenberg-Marquardt method, and the Nelder-Mead method.These algorithms were selected because they have strong stability and have a wide variety of applications.Furthermore, many modified versions have been derived [16].

The Gauss-Newton Method
According to Taylor's expansion we can obtain F(x + h) ≈ F(x) + h T F ′ (x), where h is similar to x is a vector.If we find h satisfies then we can obtain F(x + h) < F(x), in this case we can say that h is a descent direction for F. However, the Gauss-Newton method is based on the first derivatives of the vector function f (x), according to Taylor's expansion: For small ||h||, f (x + h) ≈ l(h) = f (x) + J(x)h, J(x) ∈ R m×n is the Jacobian, which contains the first partial derivatives of the function components, (J(x) h, the following equations are derived: L ′ (h) = J(x) t f (x) + J(x) t J(x)h, and L ′′ (h) = J(x) t J(x).We can see that L ′′ (h) is independent of h, and it is symmetric and if J(x) has full rank, then L ′′ (h) is also positively defined.In such a case, L(h) has a unique minimizer, which can be found by According to Equation (1), h is a descent direction for F, because h T F ′ (x) = h T J(x) T f (x) = −h T J(x)J(x) T h < 0, where J(x)J(x) T is positively defined.
The algorithm iteration process is as follows: (1) Set the initial parameter x 0 manually.
(2) Find the descent direction h by solving J(x) t J(x)h = −J(x) t f (x), then update x 0 by

The Levenberg-Marquardt Method
The Levenberg-Marquardt algorithm, originally suggested by Levenberg and later by Marquardt [17,18], is the most widely used optimization algorithm.It outperforms the gradient descent and other conjugate gradient algorithms in a wide variety of problems [19].This method is also based on the first derivative of each component of the vector function f (x).Same as the Gauss-Newton method, for small ||h||, However, unlike the Gauss-Newton method, this method adds a damping term 1  2 µh T h, where µ is a number that is different for each iteration.We can obtain (I stands for the identity matrix).Since L(h) is approximately equal to F(x + h) when ||h|| is sufficiently small, algorithm fails to converge when ||h|| is too large.If the iteration is succeeded, perhaps the step size of this iteration can be expanded, thereby reducing the number of steps needed before x approximates x T .The Levenberg-Marquardt method evaluates the quality of the iteration by ϕ = F(x)−F(x+h lm ) L(0)−L(h lm ) (the ratio between the actual and predicted decrease in function F(x)).If ϕ < 0, it means that ||h|| is too large, L is not approximately equal to F, ||h|| should be reduced.If ϕ ≈ 1 it means that L(h) is approximately equal to F(x + h).So µ is very important and the choice of the initial µ-value should be related to the size of the elements in J T J, for example by letting µ = τ × max i {a ii }, where τ is chosen by the user.During the iteration, the size of µ is controlled by ϕ, and if ϕ > 0, accept the h lm and update x by x = x + h lm .Otherwise, adjust µ and recalculate h lm , until x is close to x T .
The algorithm iteration process is as follows: (1) Set the initial parameter x 0 .
(3) Decide whether to accept h lm according to ϕ.
(4) Iterate until x is close to x T .

The Nelder-Mead Method
Since its publication in 1965 (Nelder and Mead, 1965), the Nelder-Mead "simplex" algorithm has become one of the most widely used methods for nonlinear unconstrained optimization, which is especially popular in the fields of chemistry, chemical engineering, and medicine [20].Unlike the Gauss-Newton and the Levenberg-Marquardt methods, this algorithm can iterate directly without derivative information.Taking the Power(x, a) = ca Z as an example, the algorithm iteration process is as follows: (1) Take the initial parameter as a vertex of the simplex and generate 2 additional parameters around the initial parameters, since the power model has two parameters.
(2) Construct the initial simplex S with these parameters and determine the parameters x 3 , x 2 , x 1 of the worst, second worst, and the best vertex, respectively, in the simplex S by F(x 3 ) = argmaxF(x), F(x 1 ) = argminF(x).
(3) Find the centroid m of the best side, which is opposite to the worst vertex x 3 , m = 1 2 ∑ 2 i=1 x i .(4) Construct the new simplex by replacing the worst vertex x 3 though reflection, expansion, or contraction with respect to the best side.If the new simplex is constructed successfully, repeat the above steps to continue the iteration.If this fails, shrink the simplex towards the best vertex x 1 , in this case and replacing all parameters except x 1 .
Reflection: Find the reflection vertex r where x 3 is reflected by Contraction: If F(x 2 ) < F(r), find the contraction point c through the following calculation: If Shrink: Find two new vertices by ) in case any of the above conditions are not met.
In most cases, The algorithm terminates, when the simplex S is sufficiently small (some or all vertices x are close enough), or if the number of iterations or the number of function evaluations exceeds some specified maximum allowable number [21,22].

Data Simulation
Presently numerous functions have been proposed for modelling SARs, varying in complexity from two to four parameters, in the general form they produce, theoretical background, origin, and justification [23].We chose six representative models: Power, Gompertz, Negative exponential, Logarithmic models, Weibull, and Persistence function 2. We used the nls function, the optim function, and the nls.Lm function of the minpack.lmpackage of R software to implement the above algorithms to fit the SAR model.Their applicability is evaluated from three aspects: the number of iterations, the time consumption, and the sensitivity to the initial parameter setting.
In order to investigate the difference in the number of iteration steps and time consumption of different algorithms when fitting the same nonlinear SAR model, we gave the Gauss-Newton method, the Levenberg-Marquardt method, and the Nelder-Mead method the same initial parameters x 0 and then processed the same data.To highlight the differences in time consumption and the number of iterations between these algorithms, we simulated species area data for each model by manually setting the model parameters x M and adding "white noise" items (Table 1).The noise is a random number constructed by R within a certain range.In this way, we generated a dataset in which each data point is distributed around the curve, and we can approximate that x M ≈ x T .We ran forty thousand simulations for each model and then counted the number of iterations and time that were required for different algorithms to iterate from x 0 to x T when processing the simulated data.
When we study the sensitivity of different algorithms to initial parameter setting, we only need to process general data to show their differences.We used six real species regional data sets from published papers [24,25], including mammal, bird, plant, and insect data.We set a range for each parameter, taking the initial parameters in this range can ensure the algorithm will converge in most cases and will also reflect the difference in sensitivity of different algorithms to the initial parameters.The more times the algorithm converges the less sensitive the algorithm is to initial parameters.We tested all real data sets with all models in Table 1.
Table 1.SAR models used to evaluate the applicability of the different algorithms, and the function with white noise.In this case, the independent variable (A) is area and the dependent variable f (x) is the number of species, b 0 , b 1 and b 2 are the parameters.

No.
Curve Name Model Parameters Date Simulation Shape Type

Results
We studied sensitivity to initial parameters using simulated data (Figures 1 and 2), and the number of iterations and time consumption using real data (Figure 3).As we can see from Figures 1 and 2, the Gauss-Newton and the Levenberg-Marquardt methods require relatively few iteration steps and the Nelder-Mead method requires more iteration steps.Interestingly, even though the Nelder-Mead method needs more iterations to converge, we can see from Figures 1 and 2 that this method consumes less time per iteration than the other algorithms.The differences in the sensitivity of different algorithms to the initial parameters when fitting the various models to real data are shown in Figure 3: the Levenberg-Marquardt method is least dependent on the initial parameters, followed by the Nelder-Mead method, and the Gauss-Newton method has the strongest dependence.Unexpectedly, when fitting the logarithmic model, the Gauss-Newton method is not sensitive to the initial parameters, and only needs one iteration to converge.

Discussion
Many ecologists have applied these algorithms to SAR study [26][27][28].These algorithms are based on different principles and therefore vary considerably in SAR model fitting.Both the Gauss-Newton's method and the Levenberg-Marquardt method use the Taylor's series expansion L(h) to approximate the nonlinear regression model F(x + h), and then the regression coefficients x 0 are approximated to the optimal coefficient x T through several iterations to minimize the sum of the squares of F(x).The Levenberg-Marquardt method adds a damping term and evaluates whether L(h) is a good approximation of F(x + h) before each iteration.The Levenberg-Marquardt method can be seen as a combination of the gradient descent method and the Gauss-Newton method [29].The introduction of the damping factor µ makes the Levenberg-Marquardt method more flexible.If µ is close to one, Equation (3) can be approximated as h lm ≈ − 1 µ J(x) f (x).In such cases, the Levenberg- Marquardt is converted to the gradient descent method.If µ is close to 0, Equation (3) can be approximately transformed into Equation (3), and the Levenberg-Marquardt method is converted to the Gauss-Newton method.The gradient descent method reduces the sum of the squares of the residuals of f (x) by updating the parameters in the steepest descent direction.At the beginning of the iteration, the Levenberg-Marquardt method is converted to the gradient descent method and, at the final stage of the iteration process, the Levenberg-Marquardt method is converted to the Gauss-Newton method [29].Such a strategy can quickly reduce the F(x + h) in the early stage of the iteration, while still maintaining a fast convergence rate in the final stage.Therefore, even though the Levenberg-Marquardt method requires the evaluation of the similarity of L(h) and F(x + h) in each iteration, there is not much difference in the number of iterations and running time between the Levenberg-Marquardt and the Gauss-Newton method.
Unlike the Gauss-Newton method and the Levenberg-Marquardt method, the Nelder-Mead method can take an enormous number of iterations with almost no reduction in F(x), despite being nowhere near x T , so this method requires more iterative steps to obtain to the x T .Although the Nelder-Mead method requires more iteration steps, this method consumes less time per iteration than the other algorithms.This is because both the Gauss-Newton method and the Levenberg-Marquardt method require the derivation of each parameter of the model to be fitted.Assuming that the model to be fitted has n parameters, both algorithms require n function evaluations per iteration, which undoubtedly requires a lot of time.In contrast, the Nelder-Mead method only requires one or two function evaluations per iteration, which is very important in applications for which each function evaluation is time consuming.
Finally, the sensitivity of these algorithms to the initial parameters is very different, especially between the Gauss-Newton method and the Levenberg-Marquardt method.This difference in sensitivity is driven by the replacement of the Hessian H with J T J in the Gauss-Newton method.Although this change omits a lot of calculations, it also adds instability to the algorithm, because the matrix J T J is only positive semi-definite, which may be singular matrices and ill-conditioned.Poor stability of h will result in non-convergence of the algorithm, but the Levenberg-Marquardt method introduces the damping parameter µ > 0, ensuring that the matrix J has full rank.The matrix (J T J + µI) is positive define, which means h l is a descent direction.After each iteration, the Levenberg-Marquardt method evaluates whether the Taylor expansion L approximates the residual function F and prevents the algorithm from converging due to a large ||h||.The damping parameter µ is adjusted according to the quantized result until it reaches the threshold.In contrast, the Nelder-Mead method relies on the initial parameters to generate the initial simplex according to an adjustable strategy, and then converges to the optimal parameters through the deformation of the simplex, so the selection of initial parameters and the strategy of generating simplex will affect the convergence of the algorithm.If the initial simplex is too small, it leads to local search, so the Nelder-Mead method is likely to get stuck.Rigorous analysis of the Nelder-Mead method is a very difficult mathematical problem.A detailed study carried out by Lagarias [20] contains several convergence results in one and two dimensions for strictly convex functions with bounded level sets.

Conclusions
After analyzing the number of iterations, time consumption, and sensitivity to the initial parameters, the Levenberg-Marquardt method appears to be the best choice for SAR fitting.It is not the fastest algorithm, but when processing normal data, the difference in time required by different algorithms is negligible.The most important thing is that the Levenberg-Marquardt method has minimal dependence on the initial parameters.In summary, we recommend the Levenberg-Marquardt algorithm for regular SAR model fitting.However, this algorithm requires derivative information similar to the Gauss-Newton method, whereas the Nelder-Mead method iterates by continuously constructing simplexes and does not require any derivative information of the model.Consequently, the Nelder-Mead method is the best choice when the SAR model is a non-smooth function, i.e., when performing a breakpoint species-area regression analysis (as the derivative does not exist, at this point) [30].

Figure 1 .
Figure 1.The number of steps required for the algorithms to iterate from the initial parameters x 0 to x T .GN represents the Gauss-Newton method, LM represents the Levenberg-Marquardt method, and NM represents the Nelder-Mead method.

Figure 2 .
Figure 2. The time consumption in seconds for the algorithms to iterate from the initial parameters x 0 to x T .GN represents the Gauss-Newton method, LM represents the Levenberg-Marquardt method, and NM represents the Nelder-Mead method.

Figure 3 .
Figure 3.The differences in the sensitivity of different algorithms to the initial parameters when fitting the various models.GN represents the Gauss-Newton method, LM represents the Levenberg-Marquardt method, and NM represents the Nelder-Mead method.C indicates that the algorithm converges; NC indicates that the algorithm does not converge.