Next Article in Journal
Application of Intuitionistic Fuzzy Approaches and Bonferroni Mean Operators in the Selection of Suppliers of Agricultural Equipment and Machinery for the Needs of the Agriculture 4.0 System
Previous Article in Journal
A Delayed Malware Propagation Model Under a Distributed Patching Mechanism: Stability Analysis
Previous Article in Special Issue
Estimation for Partial Functional Multiplicative Regression Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Editorial

Editorial: New Advances in High-Dimensional and Non-Asymptotic Statistics

1
Institute of Artificial Intelligence, Beihang University, Beijing 100191, China
2
College of Mathematics, Sichuan University, Chengdu 610041, China
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(14), 2267; https://doi.org/10.3390/math13142267 (registering DOI)
Submission received: 4 July 2025 / Accepted: 7 July 2025 / Published: 14 July 2025
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)

MSC:
62J07; 62H12; 62M20

1. Introduction

This editorial paper reviews the Special Issue “New Advances in High-Dimensional and Non-asymptotic Statistics” and summarizes the ten collected papers. For high-dimensional data analysis, we begin by discussing the challenges posed by datasets where the number of covariates p can be comparable to or much larger than the sample size n. Traditional statistical methods often fail in such scenarios, necessitating new approaches like regularization (e.g., Lasso) for variable selection and parameter estimation. The theoretical underpinnings of these methods are explored, including oracle inequalities that provide non-asymptotic bounds on estimation and prediction errors. Next, we resort to functional data analysis (FDA), where data points are functions rather than scalars or vectors. Key concepts like the Karhunen–Loeve expansion and Mercer’s theorem for covariance operators are discussed, along with challenges related to discrete and noisy observations of functional data, differentiating between dense and sparse sampling regimes. It also introduces the extension of these ideas to unstructured data residing in general metric spaces. The significance lies in addressing high- and infinite-dimensional and complex data scenarios increasingly common in modern scientific research, facilitating the development of non-asymptotic statistical inference and high-dimensional probability.

2. Background

In recent years, high-throughput and non-parametric complex datasets have become very common in fields such as genomics, signal processing, neuroscience, and other scientific areas. For regression problems involving complex datasets, we have encountered statistical models with the following characteristics: both the number of covariates p and the sample size n are increasing, and it is assumed that p is a function of n, i.e., p = p ( n ) . One scenario is when p grows slowly with n and satisfies p < n , allowing the maximum likelihood and moment estimators to possess consistency properties. This situation has been extensively studied, as shown in studies by Portnoy [1], He and Shao [2], Sur et al. [3], Zhang [4], Montanari et al. [5], and Li et al. [6]. Another scenario is in large-scale data, where p might be thousands of covariates and much larger than n, but in reality, only a few important predictors are active. For example, in microarray data analysis in genomics, p can be in the millions, while the number of observations n is only a few hundred or even tens [7]. This phenomenon has motivated modern statisticians to focus more on research in high-dimensional statistics.
For high-dimensional data, the important variables are sparse between all of them, which means that the number of important variables s is much smaller than p, i.e., s p . The main challenge is that directly using traditional low-dimensional statistical inference and computational methods leads to high complexity. Fortunately, regularization (or penalization) optimization methods can yield sparse parameter estimation, thereby enabling efficient variable selection. In high-dimensional data, typically only a small subset of features has a significant impact on the target variable, i.e., sparsity. A well-known regularization method is Lasso, which is a shrinkage estimation method that adds an l 1 penalty to the least squares loss of linear models, and it was introduced by Tibshirani [8] to address estimation problems in high-dimensional linear models. Further research can be found in a monograph by Buhlmann and van de Geer [9] and a review paper by Zhang and Chen [10].
It is well-known that classical probability theory laid the mathematical foundation for traditional statistics. Cutting-edge statistical methods, keeping pace with the times, have also spawned new probability problems, which in turn have promoted the development of the probability field. For example, when p = o ( n ) , research methods for weak convergence properties in statistics differ from the central limit theorem for sums of fixed-dimensional random variables. The earliest research in this area was by Portnoy [1]. In recent years, the rapid development of cutting-edge statistical fields, such as high-dimensional covariance estimation, has in turn promoted the development of probability theory and even pure mathematics, including random matrices, large deviation inequalities, and geometric functional analysis; see Vershynin [11] and Tropp [12] for more details. These studies of probability theory related to high-dimensional statistics are collectively referred to as high-dimensional probability.
If the dimensionality of the model is particularly high (even larger than the sample size), asymptotic results for the error between model estimates and true parameters can be intractable or impossible to handle with existing techniques. Most estimation errors in traditional statistical inference are based on asymptotic errors as the sample size n tends to infinity. J. Friedman [13] introduced the term Asymptopia (merging “asymptotics” and “utopia”) to critique a common pitfall in statistical theory, namely the allure of asymptotic optimality results that, while mathematically appealing, frequently bear little practical relevance for real-world performance achievable with finite data. The rise of large language models and deep learning provides compelling empirical evidence for this critique, demonstrating how complex models can deliver exceptional performance in practical finite-sample regimes, thereby challenging the validity or utility of some theoretically derived asymptotic guarantees. For generalized linear models with “large p, small n” datasets, van de Geer [14] and Zhang and Jia [15] used the first-order conditions (also known as KKT conditions) of the Lasso optimization problem combined with techniques from sub-Gaussian concentration inequalities in probability theory to provide non-asymptotic optimal error theoretical results for Lasso estimators. When n and p are fixed, sub-Gaussian concentration and exponential tail probability inequalities (also known as large deviation inequalities) play a crucial role in deriving non-asymptotic upper bounds for errors in high-dimensional estimators; see Wainwright [16] for more details. Concurrently, the non-asymptotic properties of high-dimensional variable selection problems have driven recent probability theory research with a machine learning and AI background, as shown by Vershynin [11], Zhou et al. [17], and Zhang and Huang [18] for details.
The data variables described above are Euclidean variables that have certain limitations in practice. With the development of natural sciences and engineering applications, continuous curves or image data are observed and collected by various scientific instruments [19]. As a branch of modern statistics, functional data analysis (FDA) focuses on analyzing complex data in spaces of continuous curves, surfaces, or other forms that continuously vary. In the FDA framework, each sample element is considered a function in some abstract and infinite-dimensional space (e.g., L 2 , RKHS, Sobolev spaces), see [20]. Due to mathematical difficulties, the theoretical analysis of FDA presents certain challenges. In applications, observed time-varying functional data may consist of dense or sparse time sampling points, and the number of sampling points may be larger than the sample size. Although discretely sampled functional curves are very similar to traditional multivariate data, the correlation between some adjacent sampling points can be very high, and we cannot arbitrarily permute these sampling points. Furthermore, the empirical version of the covariance function (operator) for functional data provides a theoretical basis for various non-parametric estimations and inferences related to functional data. The covariance function is a function on a two-dimensional continuous domain, representing an infinite-dimensional generalization of the covariance matrix in multivariate data. For analyzing the characterization and convergence properties of the eigenvalues of the covariance operator, the required mathematical tools include linear functional analysis, operator perturbation theory [21], and probability in abstract spaces; see Chapter 7 of Hsing and Eubank [22] for more information. Compared to statistical theory in Euclidean spaces, the technical details of research on the convergence rates of various function estimates in functional data models are very complex, see [23,24,25].
Functional data in L 2 space can also be generalized to complex functional data composed of random elements (or random objects) in a metric space ( Q , d ) . Big data generally comes in two types, namely structured data and unstructured data. Structured data is highly organized and neatly formatted, usually stored in tabular form. Unstructured data generally refers to data that is difficult to structure, such as images, documents, videos, emails, social media, and websites; all these can be considered random elements in a general metric space. As a mature type of data, structured data analysis has been relatively thoroughly studied in terms of statistical applications and theory. The statistical analysis of unstructured data is an emerging field of significant scientific importance and AI applied value. For example, the recent literature includes a study by Tavakoli et al. [26] on spatial text data analysis; Petersen and Müller [27] on studying brain functional connectivity in patients with mild cognitive impairment and Alzheimer’s disease based on Wasserstein covariance; and Chen et al. [28] on high-dimensional recurrent event data analysis using a low-dimensional dynamic factor model.

3. Current Research Status

The current research status related to this Special Issue can be divided into two parts as follows: (1) a theoretical review of high-dimensional penalized regression and (2) a literature review of functional data and functional regression models.

3.1. High-Dimensional Statistics

In machine learning and AI [17], sub-Gaussian distribution is a common data assumption, originating from a paper on Fourier analysis by the French mathematician J. Kahane in the 1960s. Subsequently, the sub-Gaussian distribution family has been frequently used in information theory [29], probability theory [30], and mathematical statistics [7,9]. High-dimensional statistical theories often rely on sub-Gaussian assumptions in data to obtain the minimax convergence rates (Chapter 15 in [16]). A zero-mean random variable X R is called sub-Gaussian if there exists a sub-Gaussian variance proxy σ 2 such that its moment-generating function satisfies E [ e t X ] e t 2 σ 2 / 2 for all t R . We denote a sub-Gaussian random variable as X subG ( σ 2 ) . From the definition of sub-Gaussianity and Chernoff’s inequality, we can obtain P ( | X | t ) 2 exp t 2 2 σ 2 . Note that σ 2 in the sub-Gaussian definition is any value satisfying the inequality. If a sub-Gaussian parameter σ 0 2 is found, then any number greater than σ 0 2 can also serve as a sub-Gaussian parameter, so the parameter in this definition is not identifiable. Buldygin and Kozachenko [30] defined the optimal sub-Gaussian parameter as
σ opt 2 ( X ) : = inf { σ 2 0 : E [ e t X ] e t 2 σ 2 / 2 , t R } .
The above σ opt 2 ( X ) is called the optimal variance proxy. From the definition of σ opt 2 ( X ) and sub-Gaussianity, we have
σ opt 2 ( X ) Var ( X ) ( the optimal variance proxy is not less than the variance ) .
When σ opt 2 ( X ) = Var ( X ) , the variable is called strictly sub-Gaussian. For example, Gaussian, symmetric Bernoulli, and uniform distributions are all strictly sub-Gaussian. The estimator for σ opt 2 ( X ) is an unresolved issue that can inspire much follow-up research in statistical machine learning, such as non-asymptotic interval estimation [31,32] and hypothesis testing requiring sub-Gaussian parameter estimates [33].
For the theory, Knight and Fu [34] proved that Lasso estimators are not asymptotically normal, and they argued that the exact and limiting distributions of Lasso estimators are difficult to derive and do not have a clear form. To avoid this trouble, a popular subsequent method is to derive non-asymptotic error bounds (oracle inequalities) for the desired high-dimensional estimators, defined as items 1 and 2 below. For Lasso-type estimators in growing dimension models p = n α , ( α 1 ) without restrictive sparsity or eigenvalue conditions, Greenshtein and Ritov [35] provided an oracle inequality for prediction error. In classical consistency analysis, the model size p is fixed and the sample size n tends to infinity. However, when both p and n tend to infinity, a common technique is to use non-asymptotic error upper bounds to derive the consistency of high-dimensional estimators. Let β * be the true regression coefficient in a high-dimensional regression model. Assume { Y i , X i } i = 1 n i . i . d . ( X , Y ) satisfies E ( Y i | X i ) = f ( X i T β * ) , where X i is a p-dimensional covariate, Y i R is a Euclidean response variable, and f is a given function. Theoretical properties of interest in high-dimensional regression include (a) the fluctuation behavior of the penalized estimator β ^ and (b) whether β ^ is consistent in some sense of distance under certain growth conditions for p. For 0 < q , we write β q : = ( i = 1 p | β i | q ) 1 / q as the q -norm of a p-dimensional vector β . Generally, we are interested in two types of non-asymptotic error upper bounds for penalized estimators:
  • Prediction error (predictive consistency): β ^ performs well on a test sample X, with a prediction error satisfying
    E X [ ( X T ( β ^ β * ) ) 2 ] R n pre ( β * ) ( or its empirical version ) ,
    where R n pre ( β * ) is a non-asymptotic convergence function (converging to 0 with n).
  • q -estimation error (or other norms): β ^ approximates the true parameter β * . With high probability,
    β ^ β * q R n est ( β * ) ,
    where R n est ( β * ) is a non-asymptotic convergence function.
Both Types 1 and 2 are called oracle inequalities. Consider the Lasso penalized linear model Y = X β * + ε , with Var ( ε | X ) = σ 2 I p . Under assumptions of sub-Gaussian response variables, sparsity s : = β * 0 , and a restricted eigenvalue γ > 0 for the design matrix, it can be shown that there exist Type 1 and Type 2 oracle inequalities [10,36]:
β ^ β * 1 12 A σ γ s log p n , 1 n X ( β ^ β * ) 2 2 9 A σ s log p γ n
with probability 1 2 p 1 A 2 8 (here, a constant A > 2 2 ). Blazere et al. [37] and Wainwright [16] (Corollary 9.26), respectively, provided two versions of oracle inequalities for l 1 -estimation error in generalized linear models under restricted eigenvalue conditions. Proving oracle inequalities requires more probability inequality techniques and is usually more complex than identities in asymptotic analysis; its ideas and techniques originate from proof techniques in non-parametric penalized regression [38]. Given n and p, oracle inequalities can provide deep insights into the non-asymptotic fluctuation properties of estimators. In recent years, oracle inequalities have made significant progress in various complex regression models in statistics and machine learning, not limited to linear models and generalized linear models. There is also research on Cox models with dependent structures and Ising models, which require some exponential-type concentration inequalities for martingales and Markov chains as theoretical foundations; see Wei et al. [39] and Xiao et al. [40]’s studies. In complex data scenarios, Han et al. [41] proposed an online inference framework for high-dimensional semiparametric single-index models with unspecified link functions, focusing on regression parameter estimation. Fu et al. [42,43] introduced modern multiclass support vector machines and multinomial logistic regression models for high-dimensional and divergent settings, particularly involving scenarios where both the number of features and the number of classes increase simultaneously. Zhang et al. [44] studied the problem of estimating the covariance matrix in large-dimension and small-sample-size scenarios. Tian and Qin [45] proposed a theory and method for estimating high-dimensional Toeplitz sparse block precision matrices and applied it to interval-valued time series modeling. For the prediction error, one can obtain predictive consistency without making too many assumptions about the model; see Zhuang and Lederer [46] for more. Lasso estimators are not asymptotically normal, but debiased Lasso estimators obtained by correcting Lasso estimators have asymptotic normality properties and can also be used to control the false discovery rate (FDR) in multiple hypothesis testing, see Javanmard and Javadi [47] and Han et al. [48] for more FDR methods.

3.2. Infinite-Dimensional Statistics

When the dimensionality of Euclidean space data tends to infinity, they can be viewed as infinite-dimensional L 2 space random process data (also called functional data). This is based on the isomorphism between L 2 space and l 2 space. Assume there is a population random process { X ( s ) } s [ 0 , 1 ] . Functional data are typically a set of independent and identically distributed random processes { X i ( s ) } i = 1 n i . i . d . X ( s ) . Using the idea of principal component analysis, the infinite-dimensional functional curve X ( s ) can be reduced to a linear combination of a series of basis functions. Let X ( s ) be a square-integrable random process with a mean function μ ( s ) : = E [ X ( s ) ] and a covariance function K ( s , t ) = E { [ X ( s ) μ ( s ) ] [ X ( t ) μ ( t ) ] } existing. The covariance function satisfies the spectral decomposition theorem (Mercer’s theorem):
K ( s , t ) = j = 1 λ j ϕ j ( s ) ϕ j ( t ) ,
where { λ j } j 1 are non-negative eigenvalues satisfying j = 1 λ j < and are non-increasing, and each ϕ j ( t ) is the corresponding eigenfunction. With Mercer’s theorem, X ( s ) has the following Karhunen–Loeve expansion:
X ( s ) = μ ( s ) + j = 1 ξ j ϕ j ( s ) ,
where ξ j = 0 1 [ X ( s ) μ ( s ) ] ϕ j ( s ) d s are called functional principal component scores, which are zero-mean and have Cov ( ξ j , ξ k ) = λ j δ j k (where δ j k = 1 if j = k and 0 otherwise). The Karhunen–Loeve expansion maps the L 2 space random process to an l 2 space sequence { ξ j } . Thus, models including functional covariates X ( s ) can be viewed as approximations of covariates { ξ j } of increasing dimension [ p = n α , ( 0 < α 1 ) ]. For real data, an estimated version of the covariance operator is obtained. The Karhunen–Loeve expansion requires some leading terms with large variance (representing large information content), and the number of selected basis functions needs to change with n. This can also balance the variance brought by the data and the bias caused by model approximation.
For example, in scalar-on-function regression, we get
Y = 0 1 β * ( s ) X ( s ) d s + ε ,
where ε is Gaussian and β * ( s ) is the unknown slope function. For ease of theoretical derivation, we first assume that X ( s ) is fully observed with respect to time s. When β * ( t ) satisfies certain smoothness conditions and X ( s ) satisfies some moment conditions (eigenvalue decay conditions corresponding to Mercer’s theorem), Hall and Horowitz [49] rigorously derived the optimal convergence rate for the estimated slope function β ^ ( s ) based on functional principal component analysis (FPCA). Dou et al. [50] further extended this to generalized functional regression models, where the response variable belongs to the exponential family. For the sample curve i, assume that the response variable Y i conditional on the random process { X i ( s ) : s [ 0 , 1 ] } , which follows a density function f Y i ( y i ) exp { θ i y i ψ ( θ i ) } of an exponential family [51]:
θ i = 0 1 X i ( t ) β ( t ) d t = : β , X i = k = 1 β k X i , ϕ k = : k = 1 β k ξ i k ,
where β ( s ) = k = 1 β k ϕ k ( t ) L 2 [ 0 , 1 ] is the expansion of the unknown slope function with the Fourier coefficients { β k } . Without the loss of generality, assume μ ( s ) = 0 . From the fully observed { X i ( s ) : s [ 0 , 1 ] } i = 1 n , the empirical version of the covariance function and its corresponding spectral decomposition are obtained:
K ^ ( s , t ) = 1 n i = 1 n X i ( s ) X i ( t ) = j = 1 λ ^ j ϕ ^ j ( s ) ϕ ^ j ( t ) ,
where λ ^ 1 λ ^ 2 0 are the non-increasing eigenvalues of the covariance operator K ^ . Define the first m estimated principal component score vector ξ ^ i : = ( ξ ^ i , 1 , , ξ ^ i , m ) T , where ξ ^ i k = X i , ϕ ^ k and m : = m n as n . Note that the true log-likelihood function regarding θ is i = 1 n ( θ i Y i ψ ( θ i ) ) . To approximate the true parameter, which is an infinite-dimensional likelihood function, consider a truncated version which is a likelihood function L n ( γ ) with an increasing number of parameters satisfying
L n ( γ ) = i = 1 n ( ξ i T γ ) Y i ψ ( ξ i T γ ) i = 1 n ( θ i Y i ψ ( θ i ) ) .
Maximizing the above truncated likelihood function in the space R m yields the truncated maximum likelihood estimate γ ^ = ( β ^ 1 , , β ^ m ) T . Then, define the estimated version of the slope function as
β ^ ( t ) = k = 1 m β ^ k ϕ ^ k ( t ) .
The handling of functional covariates as stochastic processes presents a major challenge in functional regression. Crucially, a functional covariate comprises infinitely many highly correlated predictors across its domain (observed discretely). This correlation structure is defined via the covariance operator. Estimating the slope function involves solving ill-posed inverse problems. To manage infinite dimensionality, one imposes regularity conditions on the hypothesized slope function space, ensuring tractable finite approximations. However, the convergence rates of slope estimators depend fundamentally on assumptions about the covariance operator’s eigenvalue decay and the slope function’s restricted space. Consequently, convergence rates are inherently nonparametric. State-of-the-art methods, such as FPCA using optimally truncated principal components and RKHS [52] through optimally tuned kernel ridge regression, yield optimal nonparametric procedures. Their convergence rates are critically dependent on the smoothness of both the slope function and the covariance operator.
Usually, we assume that the Fourier coefficients { β k } of the slope function and the population eigenvalues { λ j } satisfy the decay conditions
| β k | C k b , R k a λ j λ j + 1 + ( R / j ) a 1 ,
where, given the principal component variance decay constant a > 1 , the slope decay constant b satisfies the signal detection threshold condition b > ( a + 3 ) / 2 . If the number of truncated principal component scores satisfies m n 1 / ( a + 2 b ) , Dou et al. [50] proved that the L 2 error of β ^ ( s ) achieves the nonparametric optimal convergence rate.
0 1 [ β ^ ( s ) β ( s ) ] 2 d s = O p n 2 b 1 a + 2 b .
Unfortunately, due to instrument measurement limitations, actual functional curve data cannot be fully observed. In reality, sampling times are at discrete points. In practice, we usually observe n sample curves { X i ( s ) : s [ 0 , 1 ] } i = 1 n at N discrete time points, and instrument measurements may have measurement errors. The actual measured values are represented by the following formula:
X i j = X i ( T j ) + ε i j , j = 1 , , N ; i = 1 , 2 , , n ,
where { ε i j } iid E (where E denotes the distribution of ε ) are independent and identically distributed random variables satisfying E [ ε ] = 0 and Var ( ε ) = σ ε 2 . { T j } can include fixed or random sampling time points. For theoretical convenience, Cai and Yuan [53] assumed that the observation times { T j } are distributed independently and identically and follow a uniform distribution on [ 0 , 1 ] . In the literature on discretely observed functional data, it is usually assumed that N = n α , where α is defined as the sampling rate. If sampling is costly or objective conditions do not allow for it, curve sampling is sparse, which requires us to first smooth the discrete observation curves. There are two typical smoothing methods. The first is pre-smoothing each curve, which is suitable for sufficiently dense functional data. Specifically, when α 5 / 4 , the mean and covariance function estimates based on pre-smoothed curves using kernel density estimation achieve the parametric convergence rate n . This type of sampling is called “ultra-dense” [54]. The second method is for sparse functional data. Due to under-sampling, an effective smoothing method is to pool the n population curve samples together to estimate the mean and covariance functions. The pooling method usually uses local constant (or local linear) regression. With the recovery of mean and covariance function curves, eigenvalues and eigenfunctions can be further estimated via Mercer’s theorem, as described by Yao et al. [55]. Cai and Yuan [53] and Zhang and Wang [56] proved that when α 1 / 4 , estimates of mean and covariance functions using the pooling method can achieve the parametric convergence rate n . Compared to pre-smoothing methods that may lose population information, the pooling method for n population curves requires relatively fewer samples to achieve parametric convergence rates. This type of theory is significant, as it theoretically demonstrates that the pooling method is superior to pre-smoothing. Based on the above theory, a sampling rate α 1 / 4 is called “dense” and a sampling rate α < 1 / 4 is called “sparse”. Recent advancements and the theory of functional principal component analysis for discretely observed data and functional linear regressions can be founded in [57,58].

4. Summary of the Special Issue Papers

4.1. Estimation for Partial Functional Multiplicative Regression Model by Liu et al. [59]

The increasing prevalence of functional data, encompassing entities like curves and manifolds, motivates this research. Multiplicative regression is apt for data featuring positive outcomes. This paper focuses on estimating parameters within the partial functional multiplicative regression model (PFMRM). Two criteria, the least absolute relative error (LARE) and least product relative error (LPRE), are employed. An approximation of the functional predictor and its associated slope function is achieved using functional principal component basis functions. Subject to standard regularity assumptions, the study establishes the convergence rate for the slope function estimator and demonstrates the asymptotic normality for the parametric slope vector estimator using both methodologies. The performance of these proposed techniques is evaluated through Monte Carlo simulations, and their practical utility is illustrated with an analysis of the Tecator dataset.

4.2. Sharper Concentration Inequalities for Median-of-Mean Processes by Teng et al. [60]

The median-of-mean (MoM) estimation technique provides a robust statistical tool for managing datasets affected by contamination. This study introduces a variance-aware MoM estimation approach that utilizes binomial distribution tail probabilities. Under moderate conditions, the derived bound for this method proves to be more stringent than the conventional Hoeffding bound. This enhanced method is subsequently applied to explore the concentration properties of variance-dependent MoM empirical processes and the sub-Gaussian intrinsic moment norm. Furthermore, the paper presents a bound for the variance-dependent MoM estimator when applied to distribution-free contaminated data.

4.3. On a Low-Rank Matrix Single-Index Model by Mai [61] (One Citation)

This investigation offers a theoretical analysis of a single-index model featuring a low-rank matrix. While recently introduced within biostatistics, the model’s theoretical underpinnings for the simultaneous estimation of its link function and coefficient matrix remain underexplored. Utilizing the PAC–Bayesian bounds technique, this study provides a comprehensive theoretical framework for this joint estimation task. The results afford a more profound comprehension of the model’s characteristics and its prospective utility across various domains.

4.4. Generalizations of the Kantorovich and Wielandt Inequalities with Applications to Statistics by Zhang et al. [62]

Leveraging the characteristics of positive definite matrices, mathematical expectations, and positive linear functionals within matrix spaces, this study derives both the Kantorovich and Wielandt inequalities for such matrices and associated random variables. Several novel Kantorovich-type inequalities are presented, addressing ordinary matrix products, Hadamard products, and the mathematical expectations of random variables. Additionally, the study explores various interesting unified and generalized forms of the Wielandt inequality applicable to positive definite matrices. These established inequalities are subsequently used to formulate an inequality concerning different correlation coefficients and to examine applications in assessing the relative efficiency of parameter estimation within linear statistical models.

4.5. Optimal Non-Asymptotic Bounds for the Sparse β Model by Yang et al. [63]

This article examines the sparse β model, incorporating an 1 penalty, a prominent area in network data modeling for both statistical and social network analysis. A refined algorithm is introduced for parameter estimation within this proposed model. The algorithm’s efficacy is demonstrated by its connection to the proximal gradient descent method, a consequence of the loss function’s convexity. The study investigates estimation consistency and derives an optimal bound for the proposed estimator. Empirical support for the methodology’s effectiveness is provided through carefully designed simulation studies. These findings underscore the potential of this methodology to contribute to advanced network data analysis.

4.6. Non-Asymptotic Bounds of AIPW Estimators for Means with Missingness at Random by Wang and Deng [64] (One Citation)

Augmented inverse probability weighting (AIPW) is recognized for its double robustness in contexts of missing data and causal inference, ensuring consistent estimation if either the propensity score or outcome regression model is correctly specified. A key feature of AIPW is its capacity to achieve first-order equivalence with an oracle estimator (where nuisance parameters are known), even when fitted models do not converge at the parametric n rate. This research delves into the non-asymptotic characteristics of the AIPW estimator for inferring the population mean under missingness at random. Inferences for mean outcomes in both observed and unobserved groups are also explored.

4.7. Group Logistic Regression Models with p , q Regularization by Zhang et al. [65] (Nine Citations)

This study introduces a logistic regression framework employing p , q regularization designed to yield group sparse solutions. Such a model is applicable to variable selection tasks where features exhibit sparse group structures. In high-dimensional data settings, solutions to practical problems often manifest group sparsity, necessitating the study of such models. The proposed model is characterized from theoretical, algorithmic, and numerical standpoints. Theoretically, by incorporating the group-restricted eigenvalue condition, an oracle inequality—a crucial property for variable selection—is established. A global recovery bound for the logistic regression model with p , q regularization is also derived. Algorithmically, the alternating direction method of multipliers (ADMM) is adapted for the model solution, with effective methods for its subproblems. Numerically, experiments on simulated and real-world factor stock selection data demonstrate the model’s efficacy in variable selection and prediction, utilizing the presented ADMM algorithm.

4.8. Heterogeneous Overdispersed Count Data Regressions via Double-Penalized Estimations by Li et al. [66] (Five Citations)

High-dimensional negative binomial regression (NBR) for count data has gained considerable traction in diverse scientific fields. However, a common assumption in many studies is a constant dispersion parameter, which may not hold in practical scenarios. This investigation addresses variable selection and dispersion estimation in heterogeneous NBR models, where the dispersion parameter is treated as a function. Specifically, a double-regression framework is proposed, with a double 1 penalty applied to both regression components. Under restricted eigenvalue conditions and using concentration inequalities for empirical processes, oracle inequalities for the Lasso estimators of the two partial regression coefficients are established for the first time. These oracle inequalities further provide theoretical guarantees for consistency and convergence rates of the estimators, facilitating subsequent statistical inference. The effectiveness of these novel methods is substantiated through simulation studies and a real data analysis.

4.9. Representation Theorem and Functional CLT for RKHS-Based Function-on-Function Regressions by Huang et al. [67]

This study explores a nonparametric varying coefficient regression framework for modeling and estimating regression effects between two functionally correlated datasets. Modern biomedical research often involves measuring multiple patient features over time or at discrete intervals to understand biological mechanisms; conventional statistical models may yield biased estimates if they fail to adequately incorporate interventions and their dynamic responses. A shared parameter change point function-on-function regression model is introduced to assess pre- and post-intervention temporal trends, alongside a likelihood-based estimation method for intervention effects and other parameters. New methodologies are also developed for estimating and testing regression parameters for functional data using reproducing kernel Hilbert spaces (RKHSs). These regression parameter estimators are derived in closed form, avoiding large matrix inversions, thereby enhancing computational efficiency and applicability. By establishing a representation theorem and a functional central limit theorem, the asymptotic properties of these estimators are determined, and corresponding hypothesis tests are formulated. The method’s statistical properties and application are illustrated using an immunotherapy clinical trial for advanced myeloma and through simulation studies.

4.10. Sharper Sub-Weibull Concentrations by Zhang and Wei [68] (Thirty-Two Citations)

Constant-specified and exponential concentration inequalities are fundamental to the finite-sample theory prevalent in machine learning and high-dimensional statistics. This study derives more precise, constant-specified concentration inequalities for sums of independent sub-Weibull random variables. These new bounds exhibit a mixed-tail behavior, with sub-Gaussian for small deviations and sub-Weibull for larger deviations from the mean, thereby improving upon existing bounds by offering sharper constants. A novel sub-Weibull parameter is introduced, facilitating the recovery of tight concentration inequalities for random variables or vectors. In statistical applications, an 2 error bound is provided for estimated coefficients in negative binomial regressions for scenarios with heavy-tailed, sparse sub-Weibull covariates, a new contribution for such regressions. Within random matrix theory, non-asymptotic versions of the Bai-Yin theorem are developed for sub-Weibull entries, featuring exponential tail bounds. Finally, the paper demonstrates a sub-Weibull confidence region for log-truncated Z-estimators without requiring second-moment conditions.
Citation counts were obtained from Google Scholar on 28 June 2025.

Funding

H.Z. is supported in part by the National Natural Science Foundation of China (no. 12101630) and the Beihang University under the Youth Talent Start up Funding Project (no. KG16329201).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Portnoy, S. On the central limit theorem in Rp when p→∞. Probab. Theory Relat. Fields 1986, 73, 571–583. [Google Scholar] [CrossRef]
  2. He, X.; Shao, Q.M. On parameters of increasing dimensions. J. Multivar. Anal. 2000, 73, 120–135. [Google Scholar] [CrossRef]
  3. Sur, P.; Chen, Y.; Candès, E.J. The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square. Probab. Theory Relat. Fields 2019, 175, 487–558. [Google Scholar] [CrossRef]
  4. Zhang, H. A note on “MLE in logistic regression with a diverging dimension”. arXiv 2018, arXiv:1801.08898. [Google Scholar]
  5. Montanari, A.; Zhong, Y.; Zhou, K. Tractability from overparametrization: The example of the negative perceptron. Probab. Theory Relat. Fields 2024, 188, 805–910. [Google Scholar] [CrossRef]
  6. Li, Y.; Xie, J.; Zhou, G.; Zhou, W. Sequential estimation of high-dimensional signal plus noise models under general elliptical frameworks. J. Multivar. Anal. 2025, 207, 105403. [Google Scholar]
  7. Giraud, C. Introduction to High-Dimensional Statistics, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2021. [Google Scholar]
  8. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  9. Buhlmann, P.; van de Geer, S.A. Statistics for High-Dimensional Data: Methods, Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  10. Zhang, H.; Chen, S.X. Concentration inequalities for statistical inference. Commun. Math. Res. 2021, 37, 1–85. [Google Scholar] [CrossRef]
  11. Vershynin, R. High-Dimensional Probability: An Introduction with Applications in Data Science; Cambridge University Press: Cambridge, UK, 2018; Volume 47. [Google Scholar]
  12. Tropp, J.A. An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 2015, 8, 1–230. [Google Scholar] [CrossRef]
  13. Wu, C.F.J. The myth and debunking of the big four. Chin. J. Appl. Probab. Stat. 2025, 41, 329–338. [Google Scholar]
  14. van de Geer, S.A. High-dimensional generalized linear models and the lasso. Ann. Stat. 2008, 36, 614–645. [Google Scholar] [CrossRef]
  15. Zhang, H.; Jia, J. Elastic-net regularized high-dimensional negative binomial regression. Stat. Sin. 2022, 32, 181–207. [Google Scholar]
  16. Wainwright, M.J. High-Dimensional Statistics: A Non-Asymptotic Viewpoint; Cambridge University Press: Cambridge, UK, 2019; Volume 48. [Google Scholar]
  17. Zhou, P.J.; Wei, H.Y.; Zhang, H.M. Selective reviews of bandit problems in AI via a statistical view. Mathematics 2025, 4, 665. [Google Scholar] [CrossRef]
  18. Zhang, H.; Huang, H. Concentration for multiplier empirical processes with dependent weights. AIMS Math. 2023, 8, 28738–28752. [Google Scholar] [CrossRef]
  19. Ramsay, J.O.; Silverman, B.W. Applied Functional Data Analysis: Methods and Case Studies; Springer: New York, NY, USA, 2002. [Google Scholar]
  20. Giné, E.; Nickl, R. Mathematical Foundations of Infinite-Dimensional Statistical Models; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
  21. Hall, P.; Hosseini-Nasab, M. Theory for high-order bounds in functional principal components analysis. In Mathematical Proceedings of the Cambridge Philosophical Society; Cambridge University Press: Cambridge, UK, 2009; Volume 146, pp. 225–256. [Google Scholar]
  22. Hsing, T.; Eubank, R. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators; Wiley: Hoboken, NJ, USA, 2015. [Google Scholar]
  23. Dubey, P.; Müller, H.G. Functional models for time-varying random objects. J. R. Stat. Soc. Ser. B Stat. Methodol. 2020, 82, 275–327. [Google Scholar] [CrossRef]
  24. Petersen, A.; Müller, H.G. Functional data analysis for density functions by transformation to a Hilbert space. Ann. Stat. 2016, 44, 183–218. [Google Scholar] [CrossRef]
  25. Petersen, A.; Müller, H.G. Fréchet regression for random objects with Euclidean predictors. Ann. Stat. 2019, 47, 691–719. [Google Scholar] [CrossRef]
  26. Tavakoli, S.; Pigoli, D.; Aston, J.A.; Coleman, J.S. A spatial modeling approach for linguistic object data: Analyzing dialect sound variations across great Britain. J. Am. Stat. Assoc. 2019, 114, 1081–1096. [Google Scholar] [CrossRef]
  27. Petersen, A.; Müller, H.G. Wasserstein covariance for multiple random densities. Biometrika 2019, 106, 339–351. [Google Scholar] [CrossRef]
  28. Chen, F.; Chen, Y.; Ying, Z.; Zhou, K. Dynamic factor analysis of high-dimensional recurrent events. arXiv 2024, arXiv:2405.19803. [Google Scholar] [CrossRef]
  29. Russo, D.; Zou, J. How much does your data exploration overfit? Controlling bias via information usage. IEEE Trans. Inf. Theory 2019, 66, 302–323. [Google Scholar] [CrossRef]
  30. Buldygin, V.V.; Kozachenko, I.V. Metric Characterization of Random Variables and Random Processes; American Mathematical Society: Providence, RI, USA, 2000; Volume 188. [Google Scholar]
  31. Horowitz, J.L.; Lee, S. Inference in a class of optimization problems: Confidence regions and finite sample bounds on errors in coverage probabilities. J. Bus. Econ. Stat. 2023, 41, 927–938. [Google Scholar] [CrossRef]
  32. Yang, X.; Liu, X.; Wei, H. Concentration inequalities of MLE and robust MLE. Commun. Stat.-Theory Methods 2024, 53, 6944–6956. [Google Scholar] [CrossRef]
  33. Li, Y.; Tian, B. Non-asymptotic sub-Gaussian error bounds for hypothesis testing. Stat. Probab. Lett. 2022, 189, 109586. [Google Scholar] [CrossRef]
  34. Knight, K.; Fu, W. Asymptotics for lasso-type estimators. Ann. Stat. 2000, 28, 1356–1378. [Google Scholar]
  35. Greenshtein, E.; Ritov, Y. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 2004, 10, 971–988. [Google Scholar] [CrossRef]
  36. Bickel, P.J.; Ritov, Y.A.; Tsybakov, A.B. Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 2009, 37, 1705–1732. [Google Scholar] [CrossRef]
  37. Blazere, M.; Loubes, J.M.; Gamboa, F. Oracle inequalities for a group lasso procedure applied to generalized linear models in high dimension. IEEE Trans. Inf. Theory 2014, 60, 2303–2318. [Google Scholar] [CrossRef]
  38. Tsybakov, A.B. Introduction to Nonparametric Estimation; Springer: New York, NY, USA, 2008. [Google Scholar]
  39. Wei, H.; Lei, X.; Han, Y.; Zhang, H. High-dimensional inference and FDR control for simulated Markov random fields. arXiv 2022, arXiv:2202.05612. [Google Scholar]
  40. Xiao, Y.; Yan, T.; Zhang, H.; Zhang, Y. Oracle inequalities for weighted group lasso in high-dimensional misspecified Cox models. J. Inequalities Appl. 2020, 2020, 252. [Google Scholar] [CrossRef]
  41. Han, D.; Xie, J.; Liu, J.; Sun, L.; Huang, J.; Jiang, B.; Kong, L. Inference on high-dimensional single-index models with streaming data. J. Mach. Learn. Res. 2024, 25, 1–68. [Google Scholar]
  42. Fu, S.; Chen, P.; Ye, Z. Simplex-based proximal multicategory support vector machine. IEEE Trans. Inf. Theory 2022, 69, 2427–2451. [Google Scholar] [CrossRef]
  43. Fu, S.; Chen, P.; Liu, Y.; Ye, Z. Simplex-based multinomial logistic regression with diverging numbers of categories and covariate. Stat. Sin. 2023, 33, 2463–2493. [Google Scholar] [CrossRef]
  44. Zhang, B.; Huang, H.; Chen, J. Estimation of large-dimensional covariance matrices via second-order Stein-type regularization. Entropy 2023, 25, 53. [Google Scholar] [CrossRef]
  45. Tian, W.; Qin, Z. Block Toeplitz sparse precision matrix estimation for large-scale interval-valued time series forecasting. arXiv 2025, arXiv:2504.03322. [Google Scholar]
  46. Zhuang, R.; Lederer, J. Maximum regularized likelihood estimators: A general prediction theory and applications. Stat 2018, 7, e186. [Google Scholar] [CrossRef]
  47. Javanmard, A.; Javadi, H. False discovery rate control via debiased lasso. Electron. J. Stat. 2019, 13, 1212–1253. [Google Scholar] [CrossRef]
  48. Han, Y.; Guo, X.; Zou, C. Model-free controlled variable selection via data splitting. Sci. Sin. Math. 2024. [Google Scholar] [CrossRef]
  49. Hall, P.; Horowitz, J.L. Methodology and convergence rates for functional linear regression. Ann. Stat. 2007, 35, 70–91. [Google Scholar] [CrossRef]
  50. Dou, W.W.; Pollard, D.; Zhou, H.H. Estimation in functional regression for general exponential families. Ann. Stat. 2012, 40, 2421–2451. [Google Scholar] [CrossRef]
  51. Yang, X.; Song, S.; Zhang, H. Law of iterated logarithm and model selection consistency for generalized linear models with independent and dependent responses. Front. Math. China 2021, 16, 825–856. [Google Scholar] [CrossRef]
  52. Zhang, H.; Lei, X. Growing-dimensional partially functional linear models: Non-asymptotic optimal prediction error. Phys. Scr. 2023, 98, 095216. [Google Scholar] [CrossRef]
  53. Cai, T.T.; Yuan, M. Optimal estimation of the mean function based on discretely sampled functional data: Phase transition. Ann. Stat. 2011, 39, 2330–2355. [Google Scholar] [CrossRef]
  54. Zhang, J.T.; Chen, J. Statistical inferences for functional data. Ann. Stat. 2007, 35, 1052–1079. [Google Scholar] [CrossRef]
  55. Yao, F.; Müller, H.G.; Wang, J.L. Functional linear regression analysis for longitudinal data. Ann. Stat. 2005, 33, 2873–2903. [Google Scholar] [CrossRef]
  56. Zhang, X.; Wang, J.L. From sparse to dense functional data and beyond. Ann. Stat. 2016, 44, 2281–2321. [Google Scholar] [CrossRef]
  57. Zhou, H.; Yao, F.; Zhang, H. Functional linear regression for discretely observed data: From ideal to reality. Biometrika 2023, 110, 381–393. [Google Scholar] [CrossRef]
  58. Zhou, H.; Wei, D.; Yao, F. Theory of functional principal component analysis for noisy and discretely observed data. Ann. Stat. 2025. Available online: https://imstat.org/journals-and-publications/annals-of-statistics/annals-of-statistics-future-papers/ (accessed on 6 July 2025).
  59. Liu, X.; Yu, P.; Shi, J. Estimation for partial functional multiplicative regression model. Mathematics 2025, 13, 471. [Google Scholar] [CrossRef]
  60. Teng, G.; Li, Y.; Tian, B.; Li, J. Sharper concentration inequalities for median-of-mean processes. Mathematics 2023, 11, 3730. [Google Scholar] [CrossRef]
  61. Mai, T.T. On a low-rank matrix single-index model. Mathematics 2023, 11, 2065. [Google Scholar] [CrossRef]
  62. Zhang, Y.; Guo, X.; Liu, J.; Chen, X. Generalizations of the kantorovich and wielandt inequalities with applications to statistics. Mathematics 2024, 12, 2860. [Google Scholar] [CrossRef]
  63. Yang, X.; Pan, L.; Cheng, K.; Liu, C. Optimal non-asymptotic bounds for the sparse β model. Mathematics 2023, 11, 4685. [Google Scholar] [CrossRef]
  64. Wang, F.; Deng, Y. Non-asymptotic bounds of AIPW estimators for means with missingness at random. Mathematics 2023, 11, 818. [Google Scholar] [CrossRef]
  65. Zhang, Y.; Wei, C.; Liu, X. Group logistic regression models with p,q regularization. Mathematics 2022, 10, 2227. [Google Scholar] [CrossRef]
  66. Li, S.; Wei, H.; Lei, X. Heterogeneous overdispersed count data regressions via double-penalized estimations. Mathematics 2022, 10, 1700. [Google Scholar] [CrossRef]
  67. Huang, H.; Mo, G.; Li, H.; Fang, H.B. Representation theorem and functional CLT for RKHS-based function-on-function regressions. Mathematics 2022, 10, 2507. [Google Scholar] [CrossRef]
  68. Zhang, H.; Wei, H. Sharper sub-weibull concentrations. Mathematics 2022, 10, 2252. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, H.; Yang, X. Editorial: New Advances in High-Dimensional and Non-Asymptotic Statistics. Mathematics 2025, 13, 2267. https://doi.org/10.3390/math13142267

AMA Style

Zhang H, Yang X. Editorial: New Advances in High-Dimensional and Non-Asymptotic Statistics. Mathematics. 2025; 13(14):2267. https://doi.org/10.3390/math13142267

Chicago/Turabian Style

Zhang, Huiming, and Xiaowei Yang. 2025. "Editorial: New Advances in High-Dimensional and Non-Asymptotic Statistics" Mathematics 13, no. 14: 2267. https://doi.org/10.3390/math13142267

APA Style

Zhang, H., & Yang, X. (2025). Editorial: New Advances in High-Dimensional and Non-Asymptotic Statistics. Mathematics, 13(14), 2267. https://doi.org/10.3390/math13142267

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop