Sharper Sub-Weibull Concentrations

Constant-specified and exponential concentration inequalities play an essential role in the finite-sample theory of machine learning and high-dimensional statistics area. We obtain sharper and constants-specified concentration inequalities for the sum of independent sub-Weibull random variables, which leads to a mixture of two tails: sub-Gaussian for small deviations and sub-Weibull for large deviations from the mean. These bounds are new and improve existing bounds with sharper constants. In addition, a new sub-Weibull parameter if the italic should be retained. Please check the whole text. is also proposed, which enables recovering the tight concentration inequality for a random variable (vector). For statistical applications, we give an $\ell_2$-error of estimated coefficients in negative binomial regressions when the heavy-tailed covariates are sub-Weibull distributed with sparse structures, which is a new result for negative binomial regressions. In applying random matrices, we derive non-asymptotic versions of Bai-Yin's theorem for sub-Weibull entries with exponential tail bounds. Finally, by demonstrating a sub-Weibull confidence region for a log-truncated Z-estimator without the second-moment condition, we discuss and define the sub-Weibull type robust estimator for independent observations $\{X_i\}_{i=1}^{n}$ without exponential-moment conditions.


Introduction
In the last two decades, with the development of modern data collection methods in science and techniques, scientists and engineers can access and load a huge number of variables in their experiments.Over hundreds of years, probability theory lays the mathematical foundation of statistics.Arising from data-driving problems, various recent statistics research advances also contribute to new and challenging probability problems for further study.For example, in recent years, the rapid development of high-dimensional statistics and machine learning have promoted the development of the probability theory and even pure mathematics, such as random matrices, large deviation inequalities, and geometric functional analysis, etc.; see Vershynin (2018).More importantly, the concentration inequality (CI) quantifies the concentration of measures that are at the heart of statistical machine learning.Usually, CI quantifies how a random variable (r.v.) X deviates around its mean EX =: µ by presenting as one-side or two-sided bounds for the tail probability of X − µ P(X − µ > t) or P(|X − µ| > t) ≤ some small δ, ∀ t ≥ 0.
The classical statistical models are faced with fixed-dimensional variables only.However, contemporary data science motivates statisticians to pay more attention to studying p × p random Hessian matrices (or sample covariance matrices, Bai and Silverstein (2010)) with p → ∞, arising from the likelihood functions of high-dimensional regressions with covariates in R p .When the model dimension increases with sample size, obtaining asymptotic results for the estimator is potentially more challenging than the fixed dimensional case.In statistical machine learning, concentration inequalities (large derivation inequalities) are essential in deriving non-asymptotic error bounds for the proposed estimator; see Wainwright (2019); Zhang and Chen (2021).Over recent decades, researchers have developed remarkable results of matrix concentration inequalities, which focuses on non-asymptotic upper and lower bounds for the largest eigenvalue of a finite sum of random matrices.For a more fascinated introduction, please refer to the book Tropp (2015).
Motivated from sample covariance matrices, a random matrix is a specific matrix A p×p with its entries A jk drawn from some distributions.As p → ∞, random matrix theory mainly focuses on studying the properties of the p eigenvalues of A p×p , which turn out to have some limit law.Several famous limit laws in random matrix theory are different from the CLT for the summation of independent random variables since the p eigenvalues are dependent and interact with each other.For convergence in distribution, some pioneering works are the Wigner's semicircle law for some symmetric Gaussian matrices' eigenvalues, the Marchenko-Pastur law for Wishart distributed random matrices (sample covariance matrices), and the Tracy-Widom laws for the limit distribution for maximum eigenvalues in Wishart matrices.All these three laws can be regarded as the CLT of random matrix versions.Moreover, the limit law for the empirical spectral density is some circle distribution, which sheds light on the non-communicative behaviors of the random matrix, while the classic limit law in CLT is for normal distribution or infinite divisible distribution.For strong convergence, Bai-Yin's law complements the Marchenko-Pastur law, which asserts that almost surely convergence of the smallest and largest eigenvalue for a sample covariance matrix.The monograph Bai and Silverstein (2010) thoroughly introduces the limit law in random matrices.
This work aims to extend non-asymptotic results from sub-Gaussian to sub-Weibull in terms of exponential concentration inequalities with applications in count data regressions, random matrices, and robust estimators.The contributions are: (i) We review and present some new results for sub-Weibull r.v.s, including sharp concentration inequalities for weighted summations of independent sub-Weibull r.v.s and negative binomial r.v.s, which are useful in many statistical applications.
(ii) Based on the generalized Bernstein-Orlicz norm, a sharper concentration for sub-Weibull summations is obtained in Theorem 1.Here we circumvent Stirling's approximation and derive the inequalities more subtly.As a result, the confidence interval based on our result is sharper and more accurate than that in Kuchibhotla and Chakrabortty (2022) (For example, see Remark 2) and Hao et al. (2019) (see Proposition 1 with unknown constants) gave.
(iii) By sharper sub-Weibull concentrations, we give two applications.First, from the proposed negative binomial concentration inequalities, we obtain the O P ( p/n) (up to some log factors) estimation error for the estimated coefficients in negative binomial regressions under the increasing-dimensional framework p = p n and heavy-tailed covariates.Second, we provide a non-asymptotic Bai-Yin's theorem for sub-Weibull random matrices with exponential-decay high probability.
(iv) We propose a new sub-Weibull parameters, which is enabled of recovering the tight concentration inequality for a single non-zero mean random vector.The simulation studies for estimating sub-Gaussian and sub-exponential parameters show these parameters could be estimated well.
(v) We establish a unified non-asymptotic confidence region and the convergence rate for general log-truncated Z-estimator in Theorem 5.Moreover, we define a sub-Weibull type estimator for a sequence of independent observations {X i } n i=1 without the secondmoment condition, beyond the definition of the sub-Gaussian estimator.

Sharper Concentrations for Sub-Weibull Summation
Concentration inequalities are powerful in high-dimensional statistical inference, and it can derive explicit non-asymptotic error bounds as a function of sample size, sparsity level, and dimension (Wainwright, 2019).In this section, we present preparation results of concentration inequalities for sub-Weibull random variables.

Properties of Sub-Weibull norm and Orlicz-type norm
In empirical process theory, sub-Weibull norm (or other Orlicz-type norms) is crucial to derive the tail probability for both single sub-Weibull random variable and summation of random variables (by using the Chernoff's inequality).A benefit of Orlicz-type norms is that the concentration does not need the zero mean assumption.
Definition 1 (Sub-Weibull norm).For θ > 0, the sub-Weibull norm of X is defined as The • ψ θ is also called the ψ θ -norm.We define X as a sub-Weibull random variable with index θ if it has a bounded ψ θ -norm (denoted as X ∼ subW(θ)).Actually, the sub-Weibull norm is a special case of Orlicz norms below.
Example 1 (ψ θ -norm of bounded r.v.).For a r.v.|X| ≤ M < ∞, we have In general, we have following corollary to determine X ψ θ based on moment generating functions (MGF).It would be useful for doing statistical inference of ψ θ -norm.
Remark 1.If we observe i.i.d.data {X i } n i=1 from a sub-Weibull distribution, one can use the empirical moment generating function (EMGF, Gbur and Collins (1989)) to estimate the sub-Weibull norm of X.Since the EMGF m|X| θ in probability for t in a neighbourhood of zero, the value of the inverse function of EMGF at 2. Then, under some regularity conditions, m|X| θ −1 (2), is a consistent estimate for X ψ θ .
In particular, if we take θ = 1, we get the sub-exponential norm of X, which is defined as Zhang and Chen (2021), we know ∀ t ≥ 0 (3) Example 2. An explicitly calculation of the sub-exponential norm is given in Götze et al. (2021), they show that Poisson r.v.
And Example 1 with triangle inequality implies based on following useful results.
Proposition 1 (Lemma A.3 in Götze et al. (2021)).For any α > 0 and any r.v.s X, Y we have To extend Poisson variables, one can also consider concentration for sums of independent heterogeneous negative binomial variables {Y i } n i=1 with probability mass functions: where {k i } n i=1 ∈ (0, ∞) are variance-dependence parameters.Here, the mean and variance of Based on (3), we obtain following results.
Corollary 2. For any independent r.v.s In particular, if Y i is independently distributed as NB(µ i , k i ), we have where a(µ i , k Corollary 2 can play an important role in many non-asymptotic analyses of various estimators.For instance, recently Li (2022) uses the above inequality as an essential role for deriving the non-asymptotic behavior of the penalty estimator in the counting data model.
for all t ≥ 0; and then 2k) , for all k ≥ 1 we have Particularly, sub-Weibull r.v.s reduce to sub-exponential or sub-Gaussian r.v.s when θ = 1 or 2. It is obvious that the smaller θ is, the heavier tail the r.v.has.A r.v. is called heavytailed if its distribution function fails to be bounded by a decreasing exponential function, i.e. e λx dF (x) = ∞, ∀λ > 0 (the tail decays slower than some exponential r.v.s); see Foss et al. (2011).Hence for sub-Weibull r.v.s, we usually focus on the the sub-Weibull index θ ∈ (0, 1).A simple example that the heavy-tailed distributions arises when we work more production on sub-Gaussian r.v.s.Via a power transform of |X|, the next corollary explains the relation of sub-Weibull norm with parameter θ and rθ, which is similar to Lemmas 2.7.6 of Vershynin (2018) for sub-exponential norm.
By Corollary 4, we obtain that d-th root of the absolute value of sub-Gaussian is subW(2d) by letting r = 1/d.Corollary 4 can be extended to product of r.v.s, from Proposition D.2 in Kuchibhotla and Chakrabortty (2022) with the equality replacing by inequality, we state it as the following proposition.
For multi-armed bandit problems in reinforcement learning, Hao et al. (2019) move beyond sub-Gaussianity and consider the reward under sub-Weibull distribution which has a much weaker tail.The corresponding concentration inequality (Theorem 3.1 in Hao et al. (2019)) for the sum of independent sub-Weibull r.v.s is illustrated as follows.
Proposition 3 (Concentration inequality for sub-Weibull distribution).Suppose {X i } n i=1 are independent sub-Weibull random variables with X i − EX i ψ θ ≤ v. Then there exists absolute constants C 1θ and C 2θ only depending on θ such that with probability at least 1 − e −t : The weakness in the Proposition 3 is that the upper bound of S a n := In the next section, we will give a constants-specified and high probability upper bound for |S a n |, which improve Proposition 3 and is sharper than Theorem 3.1 in Kuchibhotla and Chakrabortty (2022).

Main results: concentrations for sub-Weibull summation
Based on the exponential moment condition, the Chernoff's tricks implies the following subexponential concentrations from Proposition 4.2 in Zhang and Chen (2021).
Proposition 4. For any independent r.v.s {Y i } n i=1 satisfying Y i ψ 1 < ∞, t ≥ 0, and nonrandom weight w = (w 1 , • • • , w n ) , we have But it is not easy to extend to sub-Weibull distributions.From Corollary 4, for some constant K > 0. The bound of Ee λ 1/θ |Y i | 1/θ with θ = 1 or 2 is not directly applicable for deriving the concentration of n i=1 w i (Y i − EY i ) by using the independence and Chernoff's tricks, since the MGF of Weibull r.v.do not has closed form as exponential function.Thanks to the tail probability derived by Orlicz-type norms, instead of using the upper bound for MGF, an alternative method is given by Kuchibhotla and Chakrabortty (2022) who defines the so-called Generalized Bernstein-Orlicz (GBO) norm.And the GBO norm can help us to derive tail behaviours for sub-Weibull r.v.s.
The monotone function Ψ θ,L (•) is motivated by the classical Bernstein's inequality for subexponential r.v.s.Like the sub-Weibull norm properties Corollary 3, the following proposition in Kuchibhotla and Chakrabortty (2022) allows us to get the concentration inequality for r.v. with finite GBO norm.
With an upper bound of GBO norm, we could easily derive the concentration inequality for a single sub-Weibull r.v. or even the sum of independent sub-Weibull r.v.s.The sharper upper bounds for the GBO norm is obtained for the sub-Weibull summation, which refines the constant in the sub-Weibull concentration inequality.Let ||X|| p := (E|X| p ) 1/p for all integer p ≥ 1.First, by truncating more precisely, we obtain a sharper upper bound for ||X|| p , comparing to Proposition C.1 in Kuchibhotla and Chakrabortty (2022).
The proof can be seen in the Appendix.In below, we need the moment estimation for sums of independent symmetric r.v.s.
With the help of three lemmas above, we can obtain the main results concerning the shaper and constant-specified concentration inequality for the sum of independent sub-Weibull r.v.s.
In this example, it can be seen that our method does give a much better (tighter) confidence interval.
Remark 4. Theorem 1 (b) generalizes the sub-Gaussian concentration inequalities, subexponential concentration inequalities, and Bernstein's concentration inequalities with Bernstein's moment condition.For θ < 2 in Theorem 1 (c), the tail behaviour of the sum is akin to a sub-Gaussian tail for small t, and the tail resembles the exponential tail for large t; For θ > 2, the tail behaves like a Weibull r.v. with tail parameter θ and the tail of sums match that of the sub-Gaussian tail for large t.The intuition is that the sum will concentrate around zero by the Law of Large Number.Theorem 1 shows that the convergence rate will be faster for small deviations from the mean and will be slower for large deviations from the mean.
Remark 5. Recently, similar result presented in Vladimirova et al. ( 2020) is that where K θ is some constants only depends on X and θ (K θ can be obtained by Proposition 3).But it is obvious to see this large derivation result cannot guarantee a √ n-convergence rate (as presented in Proposition 3) whereas our result always give a √ n-convergence rate, as presented in Theorem 1 (c) and Proposition 3.

Sub-Weibull parameter
In this part, a new sub-Weibull parameters is proposed, which is enable of recovering the tight concentration inequality for single non-zero mean random vector.Similar to characterizations of sub-Gaussian r.vs. in Proposition 2.5.2 of Vershynin (2018), sub-Weibull r.vs.has the equivalent definitions.
Definition 4 (Sub-Weibull r.v.,X ∼ subW(θ, v)).Define the sub-Weibull norm We denote the sub-Weibull r.v. as r ,(r ≥ 1) comparing to Proposition 1. Definition 4 is free of bounding MGF, and it avoids Stirling's approximation in the proof of the tail inequality.We obtain following main results for this moment-based norm.
Theorem 2 (sub-Weibull concentration).Suppose that there are n independent sub-Weibull r.v.s The proof of Theorem 2 can be seen in section 6.7.The concentration in this Theorem 2 will serve a critical role in many statistical and machine learning literature.For instance, the sub-Weibull concentrations in Hao et al. (2019) contain unknown parameters, which makes the algorithm for general sub-Weibull random rewards is infeasible.However, when using our results, it will become feasible as we give explicit constants in these concentrations.
Importantly, the sub-exponential parameter is a special case of sub-Weibull norm by letting θ = 1.Denote the sub-exponential parameter for r.v X as Another case of sub-Weibull norm is θ = 2, which defines sub-Gaussian parameter: Like the generalized method of moments, we can give the higher-moment estimation procedure for the norm X ϕ 2 .Unfortunately, the method in Remark 1 for estimating MGF is not stable in the simulation since the exponential function has a massive variance in some cases.
• Estimation procedure for X ϕ 2 and X ϕ 1 .Consider as a discrete optimization problem.We can take k max big enough to minimize , At the first glimpse, the bigger p is, the larger n is required in this method.Nonetheless, often, most of common distributions only require a median-size of p to give a relatively good result, then only the median-size of n in turn is required.For standard Gaussian random, centralized Bernoulli (successful probability µ = 0.3), and uniform distributed (on It can be shown that X ϕ 2 ≈ 1, 0.4582576, 0.5773503.The Figure 1, Figure 2, and Figure 3 show the estimated value from different n under estimate method (8) for the three distributions mentioned above.The estimate method (8) is a correct estimated method for sub-Gaussian parameter to our best knowledge.
For centralized negative binomial, and centralized Poisson (λ = 1) variable X, X ϕ 1 = 2.460938, 0.7357589, respectively.The Figure 4 and Figure 5 show the estimated value from different n under estimate method (8) for the four distributions mentioned above.
The five figures mentioned above show litter bias between the estimated norm and true norm.It is worthy to note that the norm estimator for centralized negative binomial case has The moment approach: standard Gaussian The moment approach : centralized Bernoulli The moment approach : uniform distribution  The moment approach: centralized negative binomial The moment approach: centralized Poisson a peak point.This is caused by sub-exponential distributions having relatively heavy tails, and hence the norm estimation may not robust as that in sub-Gaussian under relatively small sample sizes.Moreover, sub-Gaussian and sub-exponential parameter is extensible for random vectors with values in a normed space (X , • ), we define norm-sub-Gaussian parameter and normsub-exponential parameter : The norm-sub-Gaussian parameter: the norm-sub-exponential parameter: 3 Statistical Applications of Sub-Weibull Concentrations

Negative binomial regressions with heavy-tail covariates
In statistical regression analysis, the responses {Y i } n i=1 in linear regressions are assume to be continuous Gaussian variables.However, the category in classification or grouping may be infinite with index by the non-negative integers.The categorical variables is treated as countable responses for distinction categories or groups; sometimes it can be infinite.In practice, random count responses include the number of patients, the bacterium in the unit region, or stars in the sky and so on.The responses {Y i } n i=1 with covariates {X i } n i=1 belongs to generalized linear regressions.We consider i.i.d.random variables By the methods of the maximum likelihood or the M-estimation, the estimator βn is given by βn := arg min where the loss function (•, •) is convex and twice differentiable in the first argument.
In high-dimensional regressions, the dimension β may be growing with sample size n.When {Y i } n i=1 belongs to the exponential family, Portnoy (1988) studied the asymptotic behavior of βn in the generalized linear models (GLMs) as p n := dim(X) is increasing.In our study, we focus on the case that the covariates is subW(θ) heavy-tailed for θ < 1.
The target vector β * := arg min The following so-called determining inequalities guarantee the 2 -error for the estimator obtained from the smooth M-estimator defined as (9).Lemma 4 (Corollary 3.1 in Kuchibhotla ( 2018 Then there exists a vector βn ∈ R p satisfying Ẑn ( βn ) = 0 as the estimating equation of (9), Applications of Lemma 4 in regression analysis is of special interest when X is heavy tailed, i.e. the sub-Weibull index θ < 1.For the negative binomial regression (NBR) with the known dispersion parameter k > 0, the loss function is (u, y) = −yu + (y + k) log(k + e u ). (10) Thus we have ˙ (u, y) = − k(y−e u ) k+e u , ¨ (u, y) = k(y+k)e u (k+e u ) 2 , see Zhang and Jia (2022) for details.Further computation gives C(u, y) = sup |s−t|≤u e s (k+e t ) 2 (k+e s ) 2 e t and it implies that C(u, y) ≤ e 3u .Therefore, condition max 1≤i≤n C ( This condition need the assumption of the design space for max 1≤i≤n X i 2 . In NBR with loss (10), one has To guarantee that βn approximates β * well, some regularity conditions are required.
In addition, to bound max 1≤i≤n,1≤i≤k |X ik |, the sub-Weibull concentration determines: by using Corollary 3. Hence, we define the event for the maximum designs: To make sure that the optimization in ( 9) has a unique solution, we also require the minimal eigenvalue condition.
In the proof, to ensure that the random Hessian function has a non-singular eigenvalue, we define the event Theorem 3 (Upper bound for 2 -error).In the NBR with loss (10) and (C.1 − C.3), let A few comment is made on this theorem.First, in order to get βn − β * 2 p − → 0, we need p = o(n) under sample size restriction (11) with Second, note that the ε n in provability 1 − 2p 2 c n − δ − ε n depends on the models size and the fluctuation of the design by the event F max .

Non-asymptotic Bai-Yin's theorem
In statistical machine learning, exponential decay tail probability is crucial to evaluate the finite-sample performance.Unlike Bai-Yin's law with the fourth-moment condition that leads to polynomial decay tail probability, under sub-Weibull conditions of data, we provide a exponential decay tail probability on the extreme eigenvalues of a n × p random matrix.
Let A = A n,p be an n × p random matrix whose entries are independent copies of a r.v. with zero mean, unit variance, and finite fourth moment.Suppose that the dimensions n and p both grow to infinity while the aspect ratio p/n converges to a constant in [0, 1].Then Bai-Yin's law (Bai and Yin, 1993) asserted that the standardized extreme eigenvalues satisfying Next we introduce a special counting measure for measuring the complexity of a certain set in some space.The N ε is called an ε-net of K in R n if K can be covered by balls with centers in K and radii ε (under Euclidean distance).The covering number N (K, ε) is defined by the smallest number of closed balls with centers in K and radii ε whose union covers K.
For purposes of studying random matrices, we need to extend the definition of sub-Weibull r.v. to sub-Weibull random vectors.The n-dimensional unit Euclidean sphere S n−1 , is denoted by S n−1 = {x ∈ R n : x 2 = 1}.We say that a random vector X in R n is sub-Weibull if the one-dimensional marginals X, a are sub-Weibull r.v.s for all a ∈ R n .The sub-Weibull norm of a random vector X is defined as X ψ θ := sup a∈S n−1 X, a ψ θ .Similarly, define the spectral norm for any p × p matrix B as Spectral norm has many good properties, see Vershynin (2018) for details.
Furthermore, for simplicity, we assume that the rows in random matrices are isotropic random vectors 2 for all a ∈ R n .In the non-asymptotic regime, Theorem 4.6.1 in Vershynin (2018) study the upper and lower bounds of maximum (minimum) eigenvalues of random matrices with independent sub-Gaussian entries which are sampled from highdimensional distributions.As an extension of Theorem 4.6.1 in Vershynin (2018), the following result is a non-asymptotic versions of Bai-Yin's law for sub-Weibull entries, which is useful to estimate covariance matrices from heavy-tailed data [subW(θ), θ < 1].
Theorem 4 (Non-asymptotic Bai-Yin's law).Let A be an n × p matrix whose rows A i are independent isotropic sub-Weibull random vectors in R p with covariance matrix I p and max 1≤i≤n A i ψ θ ≤ K. Then for every s ≥ 0, we have where where K α := 2 1/α if α ∈ (0, 1) and K α = 1 if α ≥ 1; A(θ/2), B(θ/2) and C(θ/2) defined in Theorem 1(a).Moreover, the concentration inequality for extreme eigenvalues hold for c ≥ n log 9/p 3.3 General Log-truncated Z-estimators and sub-Weibull type robust estimators Motivated from log-truncated loss in Chen et al. (2021); Xu et al. (2022), we study the almost surely continuous and non-decreasing function ϕ c : R → R for truncating the original score function where c(|x|) > 0 is a high-order function (Xu et al., 2022) of |x| which is to be specified.For example, a plausible choose for ϕ c (x) in ( 13) should have following form For ( 14), we get ϕ c (x) ≈ x for sufficiently smaller x and ϕ c (x) x for larger x.Under ( 13), now we show that c(|x|) must obey a key inequality.For all x ∈ R, it suffices to verify i=1 , using the score function ( 14), we define the score function of data Then the influence of the heavy-tailed outliers is weaken by ϕ c [α n (X i − θ)] by choosing an optimal α n .We aim to estimate the average mean: where α n is the tuning parameter (will be determined later).
To guarantee consistency for log-truncated Z-estimators (15), we require following assumptions of c(•).
• (C.1):For a constant c 2 > 1, the c(x) satisfies weak triangle inequality and scaling property, for f (t) satisfies (C.1.In the following theorem, we establish the finite sample confidence interval and the convergence rate of the estimator θαn .Theorem 5. Let {X i } n i=1 be independent samples drawn from an unknown probability distribution {P i } n i=1 on R. Consider the estimator θαn defined as (15) with (C.1), α n → 0 and nαn and nαn .Let θ + be the smallest solution of the equation B + n (θ) = 0 and θ − be the largest solution of B − n (θ) = 0. (a).We have with the (1 − 2δ)-confidence intervals for any δ ∈ (0, 1/2) satisfies the sample condition: The ( 17) in Theorem 5 is a fundamental extension of Lemma 2.1 (see Theorem 16 in Lerasle ( 2019)) with c(x) = x 2 /2 from i.i.d.sample to independent sample.Let c(x) = |x| β /β , for i.i.d.sample, Theorem 5 implies Lemmas 2.3, 2.4 and Theorem 2.1 in Chen et al. (2021).The in Theorem 5(b) gives a theoretical guarantee for choosing the tuning parameter α n .
Proposition 7 (Theorem 2.1 in Chen et al. (2021)).Let {X i } n i=1 be a sequence of i.i.d.samples drawn from an unknown probability distribution on R. We assume Comparing to the convergence rate in (18), put O(n For example, let us deal with the Pareto distribution Pareto(α, k) with shape parameter α > 0 and scale parameter k > 0, and the density function is Pareto(α, k) has infinite variance, and it does not belong to the sub-Weibull distribution, so do the sample mean of i.i.d.Pareto distributed data.Proposition 7 shows that the estimator error for robust mean estimator enjoys sub-Weibull concentration as presented in Proposition 3, without finite sub-Weibull norm assumption of data.With the Weibull-tailed behavior, it motivates us to define general sub-Weibull estimators having the non-parametric convergence rate O(n −1/θ ) in Proposition 3 for θ > 2, even if the data do not have finite sub-Weibull norm.

Conclusions
Concentration inequalities are far-reaching useful in high-dimensional statistical inferences and machine learnings.They can facilitate various explicit non-asymptotic confidence intervals as a function of the sample size and model dimension.
Future research includes sharper version of Theorem 2 that is crucial to construct nonasymptotic and data-driven confidence intervals for the sub-Weibull sample mean.Although we have obtained sharper upper bounds for sub-Weibull concentrations, the lower bounds on tail probabilities are also important in some statistical applications (Zhang and Zhou, 2020).Developing non-asymptotic and sharp lower tail bounds of Weibull r.v.s is left for further study.For negative binomial concentration inequalities in Corollary 2, it is of interesting to study concentration inequalities of COM-negative binomial distributions (see Zhang et al. (2018)).

Acknowledgement
This work is supported n part by National Natural Science Foundation of China Grant (12101630) and the University of Macau under UM Macao Talent Programme (UMMTP-2020-01).This work is also supported in part by the Key Project of Natural Science Foundation of Anhui Province Colleges and Universities (KJ2021A1034), Key Scientific Research Project of Chaohu University (XLZ-202105).The authors thank Guang Cheng for the discussion about doing statistical inference in the non-asymptotic way and Arun Kumar Kuchibhotla for his help about the proof of Theorem 1.The authors also thank Xiaowei Yang for his helpful comments on Theorem 5.

Appendix
Proof of Corollary 1. φ |X| θ (t) is continuous for t a neighborhood of zero, by the definition, 2 ≥

6.1
Proof of Corollary 2. The first inequality is the direct application of (3) by observing that for any constant a ∈ R, and r.v.
The second inequality is obtained from (3) by considering two rate in ( ) separately.For (5), we only need to note that Then the third inequality is obtained by the first inequality and the definition of a(µ i , k i ).

6.2
Proof of Corollary 3. The first and second part of this proposition were shown in Lemma 2.1 of Zajkowski (2019).For the third result, using the bounds of Gamma function [see Jameson (2015)]: 12x) , (x > 0), it gives

6.3
Proof of Corollary 4. By the definition of The result |X| r ∼ subW(θ/r) follows by the definition of ψ θnorm again.Moreover,

6.5
The main idea in the proof is by the sharper estimates of the GBO norm of the sum of symmetric r.v.s.
Proof of Theorem 1.
(a) Without loss of generality, we assume X i ψ θ = 1.Define Y i := |X i | − (log 2) 1/θ + , then it is easy to check that P(|X i | ≥ t) ≤ 2e −t θ implies P(Y i ≥ t) ≤ e −t θ .For independent Rademacher r.v.{ε i } n i=1 , the symmetrization inequality gives From Lemma 2, we are going to handle the first term in ( 21) with the sum of symmetric r.v.s.Since P(Y i ≥ t) ≤ e −t θ , then Next, we proceed the proof by checking the moment conditions in Corollary 5.
Case θ ≤ 1: where the last inequality we use Using homogeneity, we can assume that √ p w 2 + p 1/θ w ∞ = 1.Then w 2 ≤ p −1/2 and w ∞ ≤ p −1/θ .Therefore, for p ≥ 2, where the last inequality follows form the fact that p 1/p ≤ 3 1/3 for any p ≥ 2, p ∈ N. Hence Following Corollary 5, we have Indeed, the positive limit can be argued by (2.2) in Alzer (1997).Then by the monotonicity property of the GBO norm, it gives . By Lemmas 2 and 3(b), for p ≥ 2, we have Then the following result follows by Corollary 5, Similarly, for θ > 2, it implies

6.6
Proof of Corollary 6.Using the definition of X ϕ θ , it yields

6.7
Proof of Theorem 2. Minkowski's inequality for p ≥ 1 and definition of X ϕ θ imply n i=1 , where the last inequality by letting 2k) in Corollary 3(b).

6.8
Proof of Theorem 3. Note that for ∀b ∈ Consider the decomposition For the first term, we have under the F max with t = C min /4 where we use ke X i β * (k + e X i β * ) −2 ≤ 1 and the second last inequality is from Corollary 2. For the second term, by Theorem 1 and ≤ P{F 1 , F max } + P{F 2 , F max } + P(F c R (n)) Furthermore, under F 1 ∩ F 2 ∩ F max , it gives the condition of n: (11).

6.9
Proof of Theorem 4. For convenience, the proof is divided into three steps.

Figure
Figure 1: standard Gaussian

Figure
Figure 4: centralized negative binomial
we can conclude (a).(b) It is followed from Proposition 5 and (a).(c)For easy notation, put L n