Robust Estimation and Tests for Parameters of Some Nonlinear Regression Models

: This paper uses the median-of-means (MOM) method to estimate the parameters of the nonlinear regression models and proves the consistency and asymptotic normality of the MOM estimator. Especially when there are outliers, the MOM estimator is more robust than nonlinear least squares (NLS) estimator and empirical likelihood (EL) estimator. On this basis, we propose hypothesis testing Statistics for the parameters of the nonlinear regression models using empirical likelihood method, and the simulation performance shows the superiority of MOM estimator. We apply the MOM method to analyze the top 50 data of GDP of China in 2019. The result shows that MOM method is more feasible than NLS estimator and EL estimator.


Introduction
A nonlinear regression model refers to a regression model in which the relationship between variables is not linear. Nonlinear regression model has been widely used in various disciplines. For instance, Hong [1] applied a nonlinear regression model to the economic system prediction; Wang et al. [2] studied the application of nonlinear regression model in the detection of protein layer thickness; Chen et al. [3] utilized a nonlinear regression model in the price estimation of surface-to-air missiles; Archontoulis and Miguez [4] used a nonlinear regression model in agricultural research.
The principle of median-of-means (MOM) was firstly introduced by Alon, Matias, and Szegedy [5] in order to approximate the frequency moment with space complexity. Lecué and Lerasle [6] proposed new estimators for robust machine learning based on MOM estimators of the mean of real-valued random variables. These estimators achieved optimal rates of convergence under minimal assumptions on the dataset. Lecué et al. [7] proposed MOM minimizers estimator based on MOM method. The MOM minimizers estimator is very effective when the instantaneous hypothesis may have been corrupted by some outliers. Zhang and Liu [8] applied MOM method to estimate the parameters in multiple linear regression models and AR error models of repeated measurement data.
For unknown parameters of a nonlinear regression model, Radchenko [9] proposed an estimator named nonlinear least square to approximate the unknown parameters. Ding [10] introduced the empirical likelihood (EL) estimator of the parameter of the nonlinear regression model based on the empirical likelihood method. However, when there are outliers, the general methods are more sensitive and easily affected by the outliers based on Gao and Li [11]. On the basis of the study of Zhang and Liu [8], this paper applies the MOM method to estimate the parameters of the nonlinear regression models and receives more robust results.
The paper is organized as follows: In Section 2, we review the definition of the nonlinear regression model and introduce the MOM method specifically. We prove the consistency and asymptotic properties of the MOM estimator. In Section 3, we introduce a new test method based on the empirical likelihood method for the median. Section 4 illustrates the superiority of the MOM method with simulation studies. A real application to GDP data is given in Section 5, and the conclusion is discussed in the last section.

Median-of-Means Method Applies to Nonlinear Regression Model
We consider the following nonlinear regression model introduced by Wu [12] where θ = (θ 1 , ..., θ k ) T is a fixed k × 1 unknown parameter column vector. x i is the i-th "fixed" input vector with observation y i . g(θ, x i ) is a known functional form (usually nonlinear). ε i are i.i.d errors with 0 mean and σ 2 unknown variance. According to Zhang and Liu [8], MOM estimator of θ is produced by the following steps: Step I: We seperate (y i , x i ), i = 1, ..., T into g groups. The number of observations in each group is n = T/g (Usually for the convenience of calculation, we assume that T is always divisible by g). We discuss the choice of grouping number g. According to the suggestion by Emilien et al. [13], g = 8 × log( 1 ζ ) for any ζ ∈ (0, 1), where η is the ceiling function. In fact, the structure of observations is always unknown, and the diagnosis of outliers is complicated. Therefore, we usually set ζ = C √ T for some constant C regardless of outliers.
Step II: We estimate the parameter θ in each group j, 1 ≤ j ≤ g by the nonlinear least square estimator θ (j) = ( θ Step III: The MOM estimator of The asymptotic properties of θ MOM are summarized in the following theorems. Their proofs are postponed to Appendix A. Theorem 1. For some constant C and any positive integer g, we suppose the following: (I) For certain 0 < a < b < ∞ and any θ 1 , θ 2 ∈ Θ, Θ is is an open interval (finite or infinite) of the real axis E 1 . ϕ n (θ 1 , θ 2 ) = ∞ for θ 1 = θ 2 if at least one of the points θ 1 , θ 2 is −∞ or ∞ ). i = 1, ..., n.
Suppose E|ε 1 | s < ∞ for some s ≥ 2. For n ≥ N 0 and sufficiently large positive ρ, c does not depend on n and ρ.

Empirical Likelihood Test Based on MOM Method
In Section 2, this paper uses the MOM method to estimate the parameters of the nonlinear regression model. In this section, we consider the hypothesis test that θ equals a given value parameter based on the empirical likelihood method.

Simulation Study
In this section, we use R software for simulation. Simulation experiments are carried out to compare the performance of the MOM estimator with the nonlinear least squares (NLS) estimator and the EL estimator under "no outliers" and "with outliers" cases in Examples 1-3. The definition of Mean Square Error (MSE) ofθ EL ,θ MOM andθ NLS are as follows.
θ q , θ q represent the estimated value of the parameter and the true value of the parameter Respectively in formula (8). D represents the total number of simulations, and in this article, D = 1000. The MSE results calculated in Tables 1-3 are all multiplied by 100. The results are accurate to three decimal places. In Examples 4-6, we compare our proposed method with empirical likelihood inference proposed by Jiang [15].
We report the empirical sizes and powers of the two methods, where size represents the probability of rejecting the null hypothesis provided it is true. In this paper, we set that the nominal significance level is 0.05. If the value is close to 0.05 it is good. Power represents the probability of rejecting the null hypothesis provided it is false. If the value of power is close to 1 it is good. Empirical size or power represents n 1 D , where n 1 refers to the number of times the null hypothesis is rejected in D simulations. In Tables 4-6 of this article, the size value refers to the empirical likelihood, and power refers to the empirical power. In fact, the empirical size is the estimated value of size, and the empirical power is the estimated value of power. We consider the following three forms of nonlinear regression models, which were also considered by Hong [16].
In this paper, for convenience, we fix the number of groups in simulation. We find that the result is consistent with the calculation result according to the formula g = 8 × log( 1 ζ ) which suggested by Emilien et al. [13]. Throughout the paper, the distribution abbreviations B, U, N, P represent binomial distribution, uniform distribution, normal distribution respectively and Poisson distribution. N(0,1) represents the standard normal distribution. We set the number of repeated observations T to 100, 200 , . . . , 1000. Example 1. We consider model y i = 0.8 x i + ε i , For the observation data, the grouping is carried out according to the grouping principle. Taking the effect of the measures of dispersion in data sets into consideration (accuracy of the estimator may be affected by the dispersion in the data set). x i are generated from the P(0.7), ε i are generated from N(0, 1). The output variable y i has outliers. There are three cases of outliers. We choose 1%T outliers from B(20, 1/2), 2%T outliers from U (7,8) and 2%T outliers from N(6, 2), respectively. The results are shown in Table 1.

Example 2.
We consider model y i = x 0.6 i + ε i , x i are generated from the U(2, 3), ε i are generated from N(0, 1). The output variable y i have outliers. There are three cases of outliers. We choose 1%T outliers from B(22,1/2), 2%T outliers from U(7, 8) and 2%T outliers from N(7, 3), respectively. The results are shown in the Table 2. Example 3. We consider model y i = e (0.5x i ) + ε i , x i are generated from U(−1, 0). ε i are generated from N(0, 1). The output variable y i have outliers. There are three cases of outliers. We choose 1%T outliers from B(20, 1/2), 2%T outliers from N(6, 2) and 2%T outliers from U(6, 7), respectively. The results are shown in the Table 3.  1 and 3, the results show that they are no significant differences between the MSE ofθ NLS estimator andθ EL estimator as T is large.

Example 4.
We consider model y i = 0.8 x i + ε i , x i are generated from P(0.7), ε i are generated from N(0, 1), For the power, we use θ + θ 0 with θ 0 ∈ {0.1, 0.2} as the alternative hypothesis. The results are shown in Table 4. MOMEL repersents empirical likelihood test based on MOM method, and EL represents hypothesis test based on the EL estimator. Example 5. We consider model y i = x 0.6 i + ε i , suppose x i are generated from U(2, 3), ε i are generated from N(0, 1). For power, we use θ + θ 0 with θ 0 ∈ {0.1, 0.15} as the alternative hypothesis. The results are shown in Table 5. Example 6. We consider model y i = e (0.5x i ) + ε i , x i are generated by U(−1, 0), ε i are generated by N(0, 1). For power, we use θ + θ 0 with θ 0 ∈ {0.2, 0.3} as the alternative hypothesis. The results are shown in Table 6.
From simulation results that are displayed in Tables 4-6, we can see that the size of the proposed test is close to 0.05 and the power is close to 1 as T increases. Especially when N is small, the results of MOM are significantly better than EL's. When T increases, the MOM also performs better in terms of size and power although the power of both methods tends to one. In summary, our method is better.

The Real Data Analysis
In this section, we apply the MOM method to analyze the top 50 data of GDP of China in 2019. Basing on the presentation of Zhu et al. [17], there are many methods to test whether there are outliers in the data, such as the 4d test, 3σ principle, the Chauvenet method, the t-test and the Grubbs test. Sun [18] also introduced the box plot method. Different test methods will get different outliers. So we use the box plot as shown in Figure 1 to confirm the existence of outliers in the actual data based on the suggestion of Sun et al. [18]. The outliers are 381.55, 353.71, 269.27, and 236.28 (unit: ten billion RMB).
We also use a 3-σ principle to test whether there are outliers, and the result shows that the outliers are 381.55 and 353.71. Through the test of the above two methods, we can judge that there are outliers in this real data.
Yin and Du [19] introduced a power-law distribution. For the purpose of predicting the GDP development trend of major cities accurately in China. We use the EL method, the MOM method and the NLS method to fit the curve respectively. Where x i represents the sorting order of GDP of 50 cities in descending order. The dataset is from www.askci.com (accessed on 15 February 2021).
The EL gives the nonlinear regression equation The MOM gives the nonlinear regression equation The NLS gives the nonlinear regression equation In Figure 2, the red line represents the fitting result of NLS method, and the blue line represents the fitting result of MOM method. the black line represents the fitting result of EL method, and the yellow points represent the true value of GDP.  In actual data, the true values of parameters are really unknown, so we cannot calculate MSE of the parameter. MAE refers to the average value of the absolute error. The definition of Mean Absolute Error (MAE) is given below.
In the actual data, y i refers to the true value of GDP, and y i refers to the GDP value obtained from the fitted nonlinear regression model, so we calculated the MAE. MAE of the MOM method is 11.984. MAE of the NLS method is 12.024, MAE of the MOM method is 11.982. Cross-validations are taken to examine the accuracy of forecasting. Specifically, we take 40 data as experimental data and the other 10 as forecasting data randomly and the number of independent replications is 1000. The MAE of EL, ELS and MOM are 14.206, 14.271 and 12.242 respectively. These suggest that MOM is more plausible than NLS and EL.

Conclusions
It is shown that the NLS method is not robust to outliers based on the research of Gao and Li [11]. So in this paper, firstly, we apply the MOM method to the nonlinear regression model and introduce its theory. We give the theoretical results of asymptotic normality and consistency of the MOM estimator. Secondly, we propose a new test method based on the empirical likelihood method. Thirdly, we use the MOM method to estimate the parameters of three forms of nonlinear regression models, and compare the MSE ofθ NLS , θ MOM andθ EL . The results show that the MSE ofθ MOM is the smallest from Tables 1-3 and  the size and power prove the superiority of the MOM method from Tables 4-6. Finally, the MOM method is applied to predict the GDP development of cities of China, the value of MAE shows that the prediction of the MOM method is better than the NLS method. All in all, the MOM method does not need to eliminate outliers. Regardless of whether there are outliers in the data, we will use the MOM method to get a robust estimation. Acknowledgments: Thank Shaochen Wang and Wang Zhou for their help. And thank the reviewers for their constructive comments.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In this Appendix A, we give the technical proofs of Theorems 1-3.
Lemma A1 (Chernoff's inequality, cf. Vershynin [20] Theorem 2.3.1). Let X i (i = 1, ..., n) is an independent Bernoulli random variable with the parameter p i . Consider sum M n = ∑ n i=1 X i and their mean µ = E(M n ), for any t > µ, we have Proof of Theorem 1. In accordance with the condition (I) of Theorem 1 and the Lemma 1 of Ivanov [21], for n ≥ N 0 and sufficiently large positive ρ, c does not depend on n and ρ. We have P( According to Wu [22], we can get that the least square estimate of σ 2 is (j = 1, ..., g) According to Formula (2) and the conditions (II)(III)(IV). From the Theorem 5 of Wu [12], we can know According to Pinelis [23], C 1 is a constant, we can know where Φ represent the cumulative distribution function of Standard normal distribution. Define random variables According to formula (A4), we have for all H > 0, according to the elementary inequality where o(n − 1 2 ) for large n and fixed H > 0, hence Similarly, we can get where C 2 is a constant that depends on H but not n, so we have It is easy to verify that Definite Bernoulli random variable We have used Lemma 1 in the last step. This ends the proof of Theorem 1.
For any fixed x, we define i.i.d random variables π n,j (x) = I(α n,j ≤ x), j = 1, ..., g and suppose p n (x) = P α n,j ≤ x according to Formula (A4) for all real x. The following lemma gives the central limit theorem for the partial sums of π n,j (x).
Proof of Lemma 2. For convenience, we write π n,j (x) as π n,j . By independence, for any real t and i = √ −1, we have through the Taylor's expansion, we have where we used the formula |p n − Φ(x)| = O(n − 1 2 ), when n/g → ∞ and g → ∞, so the first conclusion of the Lemma 2 can get by formula (A13). For the second conclusion, we find that the above calculations still hold if we replace x with xg − 1 2 and note the fact that We can proof the formula (A15) by the virtue of Slutsky's theorem.

Proof of Theorem 2.
(1) This follows immediately by formula (A4) and the continuous mapping theorem since the Median function is continuous.
(2) We can observe that √ N We first assume g is odd and for any real x, and we have under the above lemma, it tends to Φ( 2 π x). If g is even, we can know P √ gMedian α n,j , j = 1, ..., The right hand sides of the above two inequalities tend to Φ( 2 π x) as g → ∞. Proof of Theorem 3. Recall that where j = 1, ..., g, so formula (6) is T n,j − 0.5 1 + λ(T n,j − 0.5) = 0.
Noticed that This completes the proof.