Non-linear models are similar to linear regressions [
34] in the sense of outlining the functional relationship between a continuous response variable
Y and a set of covariates, thus providing a statistical prediction tool. Linear regressions are used to build purely empirical models, while non-linear models are typically applied when biological or physical interpretations imply relationships between responses and covariates that are not linear [
35,
36]. It is important to establish that either linearity or non-linearity is related to the unknown parameters and not the response–covariates relationship. In this context, a non-linear regression model for representing a response variable
has the general form
where
f is a known function of the designed covariate
, and
is a
p-dimensional vector of non-linear parameters indexing
f. Moreover,
denotes the random error, which is typically assumed to be normally distributed with zero mean and constant variance. It is also usual to assume that the errors are uncorrelated, that is,
for all
.
The most popular method for estimating
is the non-linear least squares, which is based on minimizing
where
. It is worth mentioning that if
, then the least squares and maximum likelihood estimators of
are the same.
Typically, point estimates for non-linear regression coefficients are obtained from iterative optimization processes based on techniques to minimize the error sum of squares. A widespread iterative method to derive least-squares estimates for non-linear models is the Gauss–Newton algorithm. In this context, if
in Equation (
1) is continuously differentiable at
, then
f can be linearized locally at
as
where
is the
Jacobian matrix whose elements
are evaluated at
. Thus, the iterative algorithm to estimate
is given by
where
is the vector of initial values for
, and
is evaluated at
. If the errors are independent and normally distributed, then the Gauss–Newton algorithm is an application of the Fisher Scoring method.
Implementations of the Gauss–Newton algorithm are available in most of the existing statistical software, but, in practice, there is no guarantee that the algorithm will converge from initial values that are far from the solution. In this sense, some improvements for this method can be found in the literature, such as the Gradient Descent and Levenberg–Marquart algorithms [
36].
After obtaining point estimates for
, one may derive confidence intervals and conduct hypothesis tests by assuming
where
can estimated by
2.1. The Semiparametric Non-Linear Regression Model
Suppose that a random experiment is conducted with
n subjects. The primary response in this setting is described by a random variable
denoting the outcome for the
i-th subject
. The full response vector of the experiment is given by
, and we assume that the behavior of
can be partially explained by a non-linear relationship involving a designed covariate
through a known function
f. Simultaneously, we can consider that part of the variability of
can also be linearly modeled by a
k-dimensional vector
of fixed covariates [
37,
38,
39]. In this context, we have the non-linear regression model
where
is a
k-dimensional vector of regression coefficients related to
, and
is the random error of the
i-th observation. Here, we assume that the errors are uncorrelated and normally distribution with zero mean and constant variance
.
A particular case arising from Equation (
2) is the
p-order polynomial regression model, which can be obtained by taking
In the context of Model (
2), let
be the full vector of designed values. In order to obtain an approximation for
f, we assume that
, and then we associate these values to each
non-linearly by
. Thus, for each point
, we take
,
, and
to express the approximation
to
f as
which is based in a Taylor’s series of the function
around
. Now, one can notice that replacing
with the observed data on the right side of the previous equation leads to the approximation
where
with
. Therefore, an alternative for Model (
2) is the semiparametric non-linear regression model given by
which holds for
.
2.2. Bayesian Inference
In this subsection, we address the problem of estimating and making inferences from Model (
2) under a fully Bayesian perspective. Firstly, the log-likelihood of vector
can be written as
where
is the precision parameter.
For the
p-order polynomial model, we have
and, specifically for the semiparametric non-linear regression model, we have
. In either case, the log-likelihood function of
can be expressed by
In this work, we have adopted weakly informative Normal prior distributions for the vectors
and
, that is
where
and
are identity matrices of sizes
q and
k, respectively. For the
p-order polynomial model, we have that
. As for parameter
, we have adopted a Gamma prior distribution with both hyperparameters equal to 0.01. We further assume prior independence among all parameters.
Now, we can express the posterior distribution of
as
From the Bayesian point of view, inferences for the elements of
can be derived from their marginal posterior distribution. Here, we have opted to use a suitable iterative procedure to draw pseudo-random samples from the approximate posterior density (Equation (
4)) in order to make inferences for
. Thus, in order to generate
N pseudo-random values for each element of
, we have adopted the MwG algorithm.
The simulated sequences’ convergence can be monitored using trace, autocorrelation plots, and statistical tests (e.g., Heidelberger and Welch [
40] and Geweke [
41]). After diagnosing convergence, some samples can be discarded as burn-in. The strategy to decrease the correlation between generated values is based on getting thinned steps, and so the final sample is supposed to have size
. After that, a descriptive summary of Equation (
4) can be obtained through approximate Monte Carlo estimators using the generated chains. We choose the posterior expected value as the Bayesian point estimator for the elements of
.
The next section illustrates the usefulness of the proposed semiparametric non-linear regression model using artificial and real datasets. All computations were performed using the R2jags package, which is available in the
R environment [
42]. The executable scripts can be made available by the authors upon justified request.
2.3. Model Comparison
There are many methods for Bayesian model selection that are useful for comparing competing models. The most popular method is the Deviance Information Criterion (DIC), which works simultaneously to measure the model’s fit and complexity. The DIC criterion is defined as
where
is the
deviance function, and
is the effective number of model parameters, where
is the posterior expected value.
Noticeably, we are not able to compute the expectation of
over
analytically. Therefore, an approximate Monte Carlo estimator for such a measure is
and so the DIC can be estimated by
The Expected Akaike (EAIC) and the Expected Bayesian (EBIC) information criteria can also be used when comparing Bayesian models [
43,
44]. Based on the approximation for the expected value of
, these measures can be estimated by
where
.
Another widely used criterion is derived as a posterior measure of goodness-of-fit based on the observed and predicted values. This measure is given by
where
denotes the estimated mean of
, which depends on the adopted model
. For instance, under the semiparametric non-linear regression model in Equation (
3), we have that
since the first two observations are not considered when computing
A under the semiparametric model in Equation (
3).