Abstract
The optimal subsampling is an statistical methodology for generalized linear models (GLMs) to make inference quickly about parameter estimation in massive data regression. Existing literature only considers bounded covariates. In this paper, the asymptotic normality of the subsampling M-estimator based on the Fisher information matrix is obtained. Then, we study the asymptotic properties of subsampling estimators of unbounded GLMs with nonnatural links, including conditional asymptotic properties and unconditional asymptotic properties.
1. Introduction
In recent years, the amount of information that people need to process is increasing dramatically. It is of great challenge to directly process massive data for statistical analysis. The divide-and-conquer strategy can mitigate the challenge of directly processing such big data [1], but it still consumes considerable computing resources. As a cheaper alternative in computing, subsampling gains its value in the case of limited computing resources.
To reduce the burden on the machine, the subsampling strategy based on big data has been given more attention in recent years. Ref. [2] proposes simple necessary and sufficient conditions for a convolved subsampling estimator to produce a normal limit that matches the target of bootstrap estimation; Ref. [3] provides an optimally distributed subsampling for maximum quasi-likelihood estimators with massive data; Ref. [4] studies some adaptive optimal subsampling algorithms; and Ref. [5] describes a subdata selection method based on leverage scores which conduct the linear model selection on a small subdata set.
GLM is a kind of statistical model with a wide range of applications such as [6,7,8]. Many subsampling studies are based on GLMs such as [3,9,10]. However, the covariates of the subsampled GLMs in the literature are bounded. When dealing with some big data problems, the size of covariate is not strictly bounded, such as the number of clicks on a web page, which can grow infinitely. This requires the extension of existing theories to the unbounded design. To fill this gap, this paper aims to study asymptotic properties of the subsampled GLMs with unbounded covariates based on empirical process and martingale technology.
Our three contributions are shown as follows: (1) we describe the asymptotic property of subsampled M-estimator using Fisher information matrix; (2) we give the conditional consistency and asymptotic normality of unbounded GLMs subsampling estimator; (3) we provide the unconditional consistency and asymptotic normality of unbounded GLMs subsampling estimator.
The rest of the paper is organized as follows. Section 2 introduces the basic concepts in GLMs and subsampling M-estimation problem. Section 3 presents the asymptotical properties for unbounded GLMs subsampling estimators. Section 4 gives the conclusion and discussion, as well as future research directions. All the technical proofs are collected in the Appendix A.
2. Preliminaries
This section introduces the subsampling M-estimation problem and GLMs.
2.1. Subsampling M-Estimation
Let be a set of loss functions with a finite dimensional convex set , and be the index of the full large dataset with -algebra , where for each the random data point (some probability space) is observed. The empirical risk is given by .
The goal is to get the solution to minimize the risk, namely
To solve Equation (1), we need satisfy: , and let . This is an M-estimation problem; see [11]. For fast solving large-scale estimation in Equation (1), we propose the subsampling M-estimation. Consider an index set with replacement from U according to the sampling probability such that The subsampling M-estimation problem is to obtain the solution satisfying
where is the i-th time subsample with replacement and is the subsampling probability of . For example, if , then ; if , then . Denote as the number of i-th subsampled data such that . And is constructed by inverse probability weighting skill such that ; see [12]. Details about properties of conditional expectation are shown in [13].
2.2. Generalized Linear Models
Let the random variable Y be the distribution of the natural exponential families indexed by parameter ,
where is often referred to as the canonical parameter belonging to its natural space
is the Lebesgue measure for continuous distributions (Normal, Gamma) or counting measure for discrete distributions (binomial, Poisson, negative binomial). The is free of .
Let be N independent sample data pairs. Here the is covariates and we assume that the response follows the distribution of the natural exponential families with the parameter . The covariates are supposed to be deterministic.
The conditional expectation of for a given is defined as a function of after a transformation by a link function . The mean value denoted as is mostly considered for regression.
If then we call that is canonical (or natural) link function, and corresponding model is canonical (or natural) GLMs; see Page 32 in [14]. Sometimes the assumption is somewhat strong and not very suitable in practice, while nonnatural link GLMs allow more flexible choices for the link function. We can further assume that and can be related by a nonnatural link function .
Let be the joint density function of the i.i.d. data from the exponential family with a link function . Then the nonnatural GLMs [15] is defined by
3. Main Results
3.1. Subsampling M-Estimation Problem
In this part we first look at the term . Define an independent random vector sequence and the subsampled , such that each vector takes the value among and let
From the definition of , we have and Then we have the asymptotic property of subsampled M-estimator.
Theorem 1.
Suppose that the risk function is twice differentiable and λ-strongly convex over Θ, that is, for , , where ≥ denotes the semidefinite positive ordering; and the sampling-based moment condition,
Then we can obtain: As conditioning on ,
where means convergence in distribution.
Theorem 1 reveals that the subsampling M-estimation scheme is theoretically feasible under mild conditions. In addition, the existence of the estimator is given by the Fisher information matrix.
3.2. Conditional Asymptotic Properties of Subsampled GLMs with Unbounded Covariates
The exponential family is very versatile for containing many common light-tail distributions such as binomial, Poisson, negative binomial, normal and Gamma. Along with their attendant convexity properties which leads to finite variance property for log-density, they can serve for a large amount of popular and effective statistical models. It is precisely because of the commonality of these distributions so that we study the subsampling problem for GLMs.
From the loss function introduced in Section 2.1, we set where is defined by Equation (2), then the problem solving the minimum of the loss function is equivalent to solve the maximum of the likelihood function. For simplicity, we assume that , then
with the nonnatural link function . We also use this idea in Section 3.3.
More generally, we consider a wider class saying quasi-GLMs, rather than GLMs, which assumes that Equation (4) holds for a certain function . Strong consistency and asymptotic normality of quasi maximum likelihood estimate in GLMs with bounded covariates are proved in [17]. For unbounded covariates, adopting the subsampled estimation of GLMs in [9], we calculate the inverse probability weighted estimator of by solving the estimating equation based on the subsampled index set S,
where is subsampled data. Equivalently, we have
Let be the estimator of the real parameter in subsampled quasi-GLMs and be the estimator of in quasi-GLMs with full data. For the unbounded quasi-GLMs with full data, is asymptotic unbiased with respect to ; see [18]. Next, we focus on the asymptotical properties of , as shown in the following theorems.
Theorem 2.
Let be subsampled from i.i.d. full data . Consider the Equation (4) and (6) where is three times continuously differentiable whose every derivative is bounded, and is twice continuously differentiable whose every derivative is also bounded. Assume that:
- (A.1)
- The range of the unknown parameter is an open subset of .
- (A.2)
- For any , .
- (A.3)
- For any and , , where .
- (A.4)
- For any and , there exists a function such that
- (A.5)
- When and , where and is the smallest eigenvalue of the matrix .
- (A.6)
- , .
Then is consistent with , i.e.,
where means conditioning on in probability.
Theorem 3.
Under the conditions in Theorem 2, as and , conditional on in probability,
in distribution, where
In this part, we complete the asymptotic properties without the moment condition of the covariates which is used in [9], and that means ’s are unbounded. Here we only provide the theoretical asymptotic results. Furthermore, the subsampling probability can be derived by A-optimal criterion like [10].
3.3. Unconditional Asymptotic Properties of Subsampled GLMs with Unbounded Covariates
In real engineering, the measurement of some response variable data is very expensive, such as superconductor data, deep space exploration data, etc. The accuracy of estimating the target parameters under measurement constraints of responses is a very important issue. Ref. [19] completed the unconditional asymptotic properties of parameter estimation in bounded GLMs with canonical link. But the unbounded GLMs with nonnatural link situation has not been discussed yet.
In this section, we continue to use the notations of Section 3.2. Through the theory of empirical process [11], we obtain the unconditional consistency of in the following theorem.
Theorem 4.
(Unconditional subsampled consistency) Assume the conditions:
- (B.1)
- where is the unbounded covariate of GLMs.
- (B.2)
- For ,where and is the second derivative with respect to .
- (B.3)
- where is the first derivative with respect to .
- (B.4)
- in (3) is twice continuously differentiable and its every derivative has a positive minimum.
- (B.5)
- in (3) is twice continuously differentiable and its every derivative has a positive minimum.
Then .
Theorem 4 directly obtains the unconditional consistency of the subsampling estimator with respect to the true parameters under the unbounded assumption.
To prove the asymptotic normality of with respect to , we briefly review the subsampled score function in Section 3.2
Next we will apply a multivariate martingale central limit theorem (Lemma 4 in [19]), which is the extension of Theorem A.1 in [20], to show the asymptotic normality of . Let be a filtration adaptive to the sampling: , where is the -algebra generated by ith sampling step. The subsample of size n is assumed to increase with N. By the filtration, we define the martingale
where is a martingale difference sequence adapted to . In addition, define ; ; and , where matrix is the symmetric square root of , i.e., , and . is the variance of .
The following theorem shows the asymptotic normality of the estimator .
Theorem 5.
Assume the conditions,
- (C.1)
- is finite and nonsingular.
- (C.2)
- , for ,where means k-th element of vector and means j-th element of vector .
- (C.3)
- is three-times continuously differentiable for every x with its domain.
- (C.4)
- For any , .
- (C.5)
- and .
- (C.6)
- ,
- (C.7)
Then
Here, we establish the unconditional asymptotic properties of subsampling estimator for unbounded GLMs. The condition ensures that small-scale subsamples also have expected performance, which greatly release the computational cost. We also present the theoretical asymptotic results, which leads to the subsampling probability using the A-optimal criterion in [10].
4. Conclusions and Future Work
In this paper, we derive the asymptotic normality of the subsampling M-estimator by Fisher information. In the unbounded GLMs with nonnatural link function, we separately obtain the conditional and unconditional asymptotic properties of subsampling estimator.
For future study, it is meaningful to apply the sub-Weibull concentration inequalities in [21] to make nonasymptotic inference. The importance sampling is not ideal, since it tends to assign high sampling probability to the observed samples. Hence, effective subsampling methods are considered for GLMs, such as Markov subsampling in [22]. Moreover, high-dimensional methods in [23,24] for subsampling need further studies.
Author Contributions
Conceptualization, B.T.; Methodology, Y.Z.; Validation, G.T.; Writing—original draft, G.T.; Writing—review & editing, B.T., Y.Z. and S.F.; Supervision, B.T.; Funding acquisition, Y.Z. and B.T. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Key University Science Research Project of Jiangsu Province 21KJB110023 and National Natural Science Foundation of China 91646106.
Data Availability Statement
Not applicable.
Acknowledgments
We would like to thank Huiming Zhang for helpful discussions on large sample theory.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Technical Details
Lemma A1
(Theorem 4.17 in [16]). Let be i.i.d. from a p.d.f. w.r.t. a σ-finite measure ν on , where and Θ is an open set in . Suppose that for every in the range of , is twice continuously differentiable on and satisfies
- (D.1)
- for and .
- (D.2)
- The Fisher information matrixis positive definite.
- (D.3)
- For any given , there exists a positive number and a positive function such that andfor all in the range of , where is Euclidean norm and for any matrix . Then there exist a sequence of estimators (based on ) such thatwhere and is the likelihood function of full data and is the real parameter. Meanwhile, there exist a sequence of estimators (based on ) such thatwhere and is the likelihood function of subsampled data and is the real parameter.
Let be the number of i-th subsampled data such that .
Lemma A2.
.
Proof.
From the definition of , one has
□
Proposition A1.
Under the conditions of Lemma A1 and
Assume that based on is an estimator of , and based on is also an estimator of , then
Proof.
Remark A1.
The last equation in the proof ensures that is a higher-order infinitesimal with respect to 1, which is true according to conditional probability with . in Equation (A6) is denoted as the higher order infinitesimal of 1 according to conditional probability with .
Proof of Theorem 1.
For every constant , one has
Furthermore,
Then, by the Lindeberg-Feller central limit theorem (Proposition 2.27 of [11]), conditional on ,
Therefore, combining the above and Proposition A1, Equation (5) holds. Thus, the proof is completed. □
Proof of Theorem 2.
Next, one needs to show convexity (i.e., uniqueness and maximum value) due to the existence of the estimators from [25]. Let
where
From [16] in Theorem 4.17, one needs to show
where and is a p-dimensional identity matrix.
Let
and
Then
and
Thus, one only needs to prove
and
for any . From the definition of , and the property of trace in P288 of [16], the left-hand side of Equation (A11) can be bounded by
From condition (A.4), one needs to prove converges to 0 so that Equation (A11) holds, and one has
Hence Equation (A11) holds. Let , and
Then . In the same way as proving Equation (A11), we have
Note that is bounded by the product of
and
Equation (A13) can be bounded as
where the penultimate equal sign applies the Lemma A2 with
Equation (A14) can be bounded as
which can be proved as the same argument of Equation (A11) by Lagrange mean value theorem. Combine the bounds of Equations (A13) and (A14), one obtains
Let be a constant. Since , one has
where is a constant. Under the definition of and , together with Theorem 1.14(ii) in [16], one obtains
Hence, Equation (A12) holds and the proof is completed. □
Proof of Theorem 3.
According to the mean value theorem, one has
where is between and , then
Let , then
According to in Equation (4), one obtains
Applying Lindeberg-Lvy CLT, one has
where
Applying [26] in Theorem 2, one has
where
Since is between and , and is consistent with with respect to in probability, then
where
At last, combining Equations (A15)–(A17) by Slutsky’s theorem, one obtains
where . The proof is completed. □
Proof of Theorem 4.
Here, one needs to prove the consistency of with respect to due to the existence of ; see [27].
Denote , and . Then the negative K-L divergence in [28] is bounded,
where and . Then for any , one has the well-separation condition
Let , which is essentially a logarithmic likelihood function of subsampled GLMs, and is the function’s maximum point. Thus, one has the nearly maximization .
Let . Now one obtains
where and are both between and and .
Let and by (B.3), one has
where is the -norm in P269-P270 of [11] and . And then from the Example 19.7 in [11], one obtains
where is called which is the minimum number of -brackets needed to cover ; see P270 in [11]. And K is a constant, and .
Therefore, the class is P-Glivenko-Cantelli by Theorem 19.4 in [11]. And from the definition of P-Glivenko-Cantelli in P269 of [11], we have
Finally, according to Theorem 5.7 in [11], we get . The proof is then completed. □
Lemma A3.
For , assume that
- (E.1)
- is finite and nonsingular.
- (E.2)
- For ,
- (E.3)
- For ,Then,
Proof.
One derives each entry in the matrix by
By the definition of , one has
Next, one obtains
where the first equality is based on the fact that after conditioning on the N data points, the n repeating sampling steps should be independent and identically distributed in each step. The last equality holds by the conditions (E.2) and (E.3). □
Lemma A4.
Under the conditions (C.1)–(C.5) in Theorem 5, if for all large n and , then
Proof.
By Taylor’s expansion:
where and is between and . From assumption (C.3), (C.4) and (C.5) in Theorem 5, we have
Then . Therefore, by Lemma A3, one has
which implies
Hence, the proof is completed. □
Lemma A5.
is a martingale difference sequence adapt to the filtration .
Proof.
The ’s are -measurable by the definition of and the definition of the filtration . Then we obtain
By the definition of martingale difference sequence in P230 of [29], the proof is completed. □
Under the definition of , it is obvious that .
Lemma A6.
.
Proof.
By symmetry of , we only to show for any N, is positive definite.
Therefore, is equivalent to the positive definite matrix . The proof is completed. □
Lemma A7
(Multivariate version of martingale CLT, Lemma 4 in [19]). For , let be a martingale difference sequence in relative to the filtration and let be an -measurable random vector. Set . Assume that
- (F.1)
- ;
- (F.2)
- for some sequence of positive definite matrices with i.e., the largest eigenvalue is uniformly bounded;
- (F.3)
- For some probability distribution , ∗ denotes convolution and denotes the law of random variates:Then
Lemma A8
(Asymptotic normality of ). Assume that
- (G.1.)
- ;
- (G.2.)
- .Then
Proof.
The conditions in lemma A7 can be substituted with
By Lemma A5, conditions (F.1) and (F.2) of Lemma A7 are satisfied. Next we only need to show the third condition in Lemma A7 holds. According to central limit theorem we have
For any , let , , due to the properties of the complex multivariate normal distributions are equivalent to the properties of real multivariate normal distributions in P222 of [30], and , one has
Thus, according to Equations (45.4)–(45.6) in P108 of [30], one has
Further, we obtain
Therefore, condition (F.3) in Lemma A7 is verified. Then one obtains
The proof is completed. □
Proof of Theorem 5.
According to Lemma A4,
Multiplying with in (A18), one obtains
Applying Lemma A8, one obtains
The proof is completed. □
References
- Xi, R.; Lin, N. Direct regression modelling of high-order moments in big data. Stat. Its Interface 2016, 9, 445–452. [Google Scholar] [CrossRef]
- Tewes, J.; Politis, D.N.; Nordman, D.J. Convolved subsampling estimation with applications to block bootstrap. Ann. Stat. 2019, 47, 468–496. [Google Scholar] [CrossRef]
- Yu, J.; Wang, H.; Ai, M.; Zhang, H. Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data. J. Am. Stat. Assoc. 2022, 117, 265–276. [Google Scholar] [CrossRef]
- Yao, Y.; Wang, H. A review on optimal subsampling methods for massive datasets. J. Data Sci. 2021, 19, 151–172. [Google Scholar] [CrossRef]
- Yu, J.; Wang, H. Subdata selection algorithm for linear model discrimination. Stat. Pap. 2021, 63, 1883–1906. [Google Scholar] [CrossRef]
- Fu, S.; Chen, P.; Liu, Y.; Ye, Z. Simplex-based Multinomial Logistic Regression with Diverging Numbers of Categories and Covariates. Stat. Sin. 2022, in press. [Google Scholar] [CrossRef]
- Ma, J.; Xu, J.; Maleki, A. Analysis of sensing spectral for signal recovery under a generalized linear model. Adv. Neural Inf. Process. Syst. 2021, 34, 22601–22613. [Google Scholar]
- Mahmood, T. Generalized linear model based monitoring methods for high-yield processes. Qual. Reliab. Eng. Int. 2020, 36, 1570–1591. [Google Scholar] [CrossRef]
- Ai, M.; Yu, J.; Zhang, H.; Wang, H. Optimal Subsampling Algorithms for Big Data Regressions. Stat. Sin. 2021, 31, 749–772. [Google Scholar] [CrossRef]
- Wang, H.; Zhu, R.; Ma, P. Optimal subsampling for large sample logistic regression. J. Am. Stat. Assoc. 2018, 113, 829–844. [Google Scholar] [CrossRef]
- van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: London, UK, 1998. [Google Scholar]
- Wooldridge, J.M. Inverse probability weighted M-estimators for sample selection, attrition, and stratification. Port. Econ. J. 2002, 1, 117–139. [Google Scholar] [CrossRef]
- Durret, R. Probability: Theory and Examples, 5th ed.; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar]
- McCullagh, P.; Nelder, J. Generalized Linear Models, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 1989. [Google Scholar]
- Fahrmeir, L.; Kaufmann, H. Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Stat. 1985, 13, 342–368. [Google Scholar] [CrossRef]
- Shao, J. Mathematical Statistics, 2nd ed.; Springer: New York, NY, USA, 2003. [Google Scholar]
- Yin, C.; Zhao, L.; Wei, C. Asymptotic normality and strong consistency of maximum quasi-likelihood estimates in generalized linear models. Sci. China Ser. A 2006, 49, 145–157. [Google Scholar] [CrossRef]
- Rigollet, P. Kullback-Leibler aggregation and misspecified generalized linear models. Ann. Stat. 2012, 40, 639–665. [Google Scholar] [CrossRef]
- Zhang, T.; Ning, Y.; Ruppert, D. Optimal sampling for generalized linear models under measurement constraints. J. Comput. Graph. Stat. 2021, 30, 106–114. [Google Scholar] [CrossRef]
- Ohlsson, E. Asymptotic normality for two-stage sampling from a finite population. Probab. Theory Relat. Fields 1989, 81, 341–352. [Google Scholar] [CrossRef]
- Zhang, H.; Wei, H. Sharper Sub-Weibull Concentrations. Mathematics 2022, 10, 2252. [Google Scholar] [CrossRef]
- Gong, T.; Dong, Y.; Chen, H.; Dong, B.; Li, C. Markov Subsampling Based on Huber Criterion. IEEE Trans. Neural Netw. Learn. Syst. 2022, in press. [Google Scholar] [CrossRef]
- Xiao, Y.; Yan, T.; Zhang, H.; Zhang, Y. Oracle inequalities for weighted group lasso in high-dimensional misspecified Cox models. J. Inequalities Appl. 2020, 2020, 252. [Google Scholar] [CrossRef]
- Zhang, H.; Jia, J. Elastic-net regularized high-dimensional negative binomial regression: Consistency and weak signals detection. Stat. Sin. 2022, 32, 181–207. [Google Scholar] [CrossRef]
- Ding, J.L.; Chen, X.R. Large-sample theory for generalized linear models with non-natural link and random variates. Acta Math. Appl. Sin. 2006, 22, 115–126. [Google Scholar] [CrossRef]
- Jennrich, R.I. Asymptotic properties of non-linear least squares estimators. Ann. Math. Stat. 1969, 40, 633–643. [Google Scholar] [CrossRef]
- White, H. Maximum likelihood estimation of misspecified models. Econom. J. Econom. Soc. 1982, 50, 1–25. [Google Scholar] [CrossRef]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Davidson, J. Stochastic Limit Theory: An Introduction for Econometricians; OUP Oxford: Oxford, UK, 1994. [Google Scholar]
- Kotz, S.; Balakrishnan, N.; Johnson, N.L. Continuous Multivariate Distributions, Volume 1: Models and Applications, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).