Skip Content
You are currently on the new version of our website. Access the old version .
SymmetrySymmetry
  • Article
  • Open Access

30 January 2026

Least Absolute Deviation Estimation for Uncertain Regression Model via Uncertainty Distribution and Its Application in Sport Statistics

School of Physical Education, Beijing Jiaotong University, Beijing 100124, China
This article belongs to the Special Issue Symmetry Analysis of Uncertainty Theory and Uncertain Statistics and Their Interdisciplinary Applications

Abstract

Uncertain regression analysis is a powerful tool for analyzing and interpreting the complex relationships between explanatory and response variables under uncertain environments, and a crucial step in analyzing datasets containing complex uncertainties is statistical inference based on uncertain parameter estimation methods. However, the existing parameter estimation studies of uncertain regression models all fail to effectively avoid the negative impact of outliers on the estimation results. To solve the above problem and further enrich the parameter estimation research, this paper constructs a symmetric statistical invariant for the uncertain regression model based on observed data and uncertain disturbance terms. Based on this statistical invariant, the least absolute deviation criterion is applied to propose a least absolute deviation estimation for the uncertain regression model. Finally, two numerical examples are provided to illustrate the advantages of the proposed method compared to existing methods, and the comparative results show that in certain scenarios, the least absolute deviation estimation method exhibits superior performance compared to other existing methods in terms of mean squared error, mean absolute error, and mean absolute percentage error. Furthermore, as a byproduct of this paper, the proposed method is applied to sports statistics, and two empirical cases are also provided to demonstrate the effectiveness of this application.

1. Introduction

As a core method in statistics for exploring relationships between variables, the evolution of regression analysis reflects the continuous theoretical innovation and paradigm expansion within the discipline in order to address the ever-emerging challenges of real-world data. This evolution can be traced back to the revelation of Galton [1] for the “regression to the mean” law in genetic phenomena at the end of the 19th century, which laid the empirical foundation for the concept of regression. Subsequently, the classical linear regression analysis was established through the theoretical refinement of the least squares method, becoming the cornerstone of relationship modeling, and the classical discussions on this topic can be found in Draper and Smith [2].
As application scenarios have become increasingly complex, a series of key model variants have emerged. In order to handle categorical response variables, Berkson [3] systematically proposed a logistic regression model based on the Logit function. For the sake of overcoming estimation instability caused by multicollinearity of independent variables, Hoerl and Kennard [4] introduced ridge regression with L2 penalty. In the era of high-dimensional data analysis, to simultaneously perform variable selection and parameter estimation, Tibshirani [5] proposed lasso regression with L1 penalty. For the purpose of combining the advantages of ridge regression and lasso regression and handle highly correlated variable groups, Zou and Hastie [6] further developed elastic network regression.
Overall, regression analysis has evolved from a tool describing linear relationships into a vast methodological system encompassing parametric, nonparametric, and regularized methods, its development driven by a deep understanding and modeling need for complex data structures. For those wishing to gain a deep understanding of regression analysis methods and keep abreast of its latest developments, the following works are widely considered to be of significant reference value. The work of Freund et al. [7] systematically explained the principles of regression model construction, statistical inference, and diagnostic techniques, covering a variety of application scenarios from basic to advanced. The work of Sen and Srivastava [8] focused on the combination of theoretical framework and computational methods, providing readers with rigorous mathematical derivations and practical implementation guidance, and the work of Chatterjee and Simonoff [9] focused on the theoretical expansion and practical application of linear and generalized linear models, and explored contemporary hot topics such as high-dimensional data modeling. These documents not only comprehensively review the core concepts and application paradigms of regression analysis but also deeply analyze important recent advances in the field, such as regularization methods and nonparametric techniques, providing researchers with a crucial perspective on the evolution and cutting-edge trends of the discipline.
The aforementioned classic research methods are all based on probability theory, constructing nondeterministic phenomena arising from randomness in the real world into stochastic models with repeatability and statistical regularity, and characterizing them using mathematical tools such as probability distributions and random variables. However, in practical observation, besides the aforementioned stochastic environments, another fundamentally different type of uncertainty widely exists. Its core characteristic is that the related phenomena do not have a stable frequency of occurrence, or it is difficult to obtain reliable statistical regularities through a large number of repeated experiments. This type of uncertainty often stems from incomplete information, cognitive ambiguity, system complexity, or lack of knowledge, and its essence is closer to cognitive uncertainty than objective randomness. If we forcibly use probability theory tools to model a system with cognitive uncertainty, we will inevitably get a result of frequency instability, which will violate the original theoretical assumption (frequency stability assumption). Relevant references can be found in Jiang and Ye [10] and Liu [11]. Therefore, traditional probabilistic frameworks often face limitations in theoretical foundation or empirical evidence when dealing with such cognitive uncertainty or knowledge uncertainty problems. To rigorously characterize this cognitive uncertainty that cannot be fully described by the classical probability framework, Liu [12] and Liu [13] established and systematized a completely new axiomatic system of mathematics, named as uncertainty theory, which aims to provide a self-consistent quantitative and analytical tool for non-random phenomena that lack sufficient historical data, rely on expert beliefs, or have unique one-off occurrences. The regression analysis paradigm developed on this theoretical basis naturally gave rise to uncertain regression analysis, which was investigated by Yao and Liu [14]. This emerging branch replaces the random error term or parameters in traditional regression models with uncertain variables that follow the axiomatic system of uncertainty theory, thus constructing a new type of uncertain regression model.
Building upon the work of Yao and Liu [14], the uncertain regression model system has been systematically expanded and deepened. For example, addressing the modeling needs of multiple response variables, Ye and Liu [15] proposed a multiple uncertain regression model, achieving simultaneous analysis of the correlation structure between multidimensional variables. In addition, many scholars have also studied other types of uncertain regression models, such as uncertain regression models with autoregressive time series errors (Chen [16]) and moving average time series errors (Chen [17]), uncertain panel regression model (Jiang and Ye [10]), and uncertain nonparametric regression model (Ding and Zhang [18]). In addition to this, the statistical inference of unknown parameters and uncertain disturbance terms for uncertain regression models has always been a hot topic of academic attention. Among them, the research on statistical inference of unknown parameters has yielded rich results, with representative works including least squares estimation (Yao and Liu [14]), least absolute deviations estimation (Liu and Yang [19]), uncertain maximum likelihood estimation (Lio and Liu [20], and Liu and Qin [21]), Tukey’s biweight estimation (Chen [22]), and moment estimation (Liu [11]), while the exploration of statistical inference of uncertain disturbance terms has also accumulated a series of key research conclusions, and the relevant results includes moment estimation (Lio and Liu [23]), uncertain maximum likelihood estimation (Lio and Liu [20], and Liu and Liu [24]), and least squares estimation (Liu and Liu [25]).
However, the aforementioned parameter estimation studies of uncertain regression models all fail to effectively avoid the negative impact of outliers on the estimation results. In particular, the least squares estimation method based on the uncertainty distribution function suffers from this problem. Since the objective function is constructed based on the deviation between the empirical distribution function and the population distribution function, squaring actually increases the relative weight of the deviation caused by outliers in the objective function. To address this issue, this paper will apply the least absolute deviation criterion to construct the least absolute deviation estimation for uncertain regression model. The main contributions of this paper are as follows:
  • A symmetric statistical invariant based on uncertain regression model was constructed, and the least absolute deviation estimation of the uncertain regression model was proposed by applying the least absolute deviation criterion to this statistical invariant.
  • The least absolute deviation estimation of the uncertain regression model was applied to the uncertain linear regression model, uncertain exponential growth model, and uncertain logistic decay model, respectively, and the advantages of the proposed method were illustrated with two numerical examples.
  • The method proposed in this paper was applied to two typical scenarios in sports statistics, and the corresponding uncertain statistical models were studied based on real data.

2. Estimating Unknown Parameters of Uncertainty Distribution via the Least Absolute Deviation Criterion

Determining the unknown parameters of the uncertainty distribution function is a fundamental step in constructing an uncertain statistical model. This process provides methodological support for subsequent statistical inference and decision-making, making it possible to achieve robust optimization and precise risk management of the system in an uncertain environment. Specifically, given a family of uncertainty distribution functions { Φ β : β Θ } with unknown parameters, where β is the vector of unknown parameters to be determined and Θ is the parameters space. For an uncertain variable that follows this family of uncertainty distribution functions { Φ β : β Θ } , we assume that a set of observed values data x 1 , x 2 , , x n can be obtained through observations. Then the parameter estimation problem in uncertain statistics is determining the vector of unknown parameters β in the family of uncertainty distribution functions { Φ β : β Θ } based on this set of observed values data and the operational rules of uncertainty theory.
One of the widely adopted fundamental ideas in the field of uncertain statistics is that by constructing a certain mathematical proximity measure, search for the optimal population distribution within the preset family of uncertainty distribution functions { Φ β : β Θ } such that the distance between this distribution function and the empirical distribution function derived from the observed values data x 1 , x 2 , , x n is minimized, thereby determining the vector of unknown parameters β . Note that the empirical distribution function of the set of observed values data x 1 , x 2 , , x n is
i = 1 n I ( x i ( , x ] ) n .
Based on this, Ning and Liu [26] constructed an objective function that estimates the vector of unknown parameters β by minimizing the sum of the absolute values of the differences between the empirical distribution function of the set of observed values data and the assumed population distribution function Φ β . Specifically, the least absolute deviation estimation based on uncertainty distribution can be transformed into solving the following optimization problem,
min β Θ i = 1 n j = 1 n I ( x j ( , x i ] ) n Φ β ( x i ) .
To facilitate the subsequent construction of theoretical models and mathematical derivations, the parameter estimation based on the least squares criterion under the assumption of a normal uncertainty distribution will be presented below.
Assume that there is a set of independent and identically distributed observed values data x 1 , x 2 , , x n follows a normal uncertainty distribution N ( μ , σ ) , where μ is the expected value parameter and σ > 0 is the standard deviation parameter. Note that the distribution function of the normal uncertainty distribution N ( μ , σ ) is
Φ μ , σ = 1 exp π ( μ x ) 3 σ + 1 .
Then the least squares estimations of the unknown parameters μ and σ are the solutions to the following optimization problem,
min μ , σ > 0 i = 1 n j = 1 n I ( x j ( , x i ] ) n 1 exp π ( μ x i ) 3 σ + 1 .

3. Symmetric Statistical Invariant of Uncertain Regression Model

In the standard setting of uncertain regression analysis, we usually assume the existence of p-dimensional explanatory variables ( x 1 , x 2 , , x p ) and corresponding response variables y. To explore in depth the synergistic effect mechanism of these explanatory variables on the response variable under uncertain conditions, Yao and Liu [14] pioneered the construction of an uncertain regression model structure as follows,
y = f ( x 1 , x 2 , , x p | β ) + ε .
The core innovation of this model lies in modeling the non-deterministic factors in the variable relationship that are difficult to quantify precisely as uncertain variables. Here, the function f represents the deterministic relationship structure between the explanatory variables ( x 1 , x 2 , , x p ) and the response variable y, β is the parameter vector to be estimated, and ε is defined as an uncertain disturbance term with zero expected value and standard deviation σ , denoted as N ( 0 , σ ) .
Suppose that there is a set of observation sequences
( x i 1 , x i 2 , , x i p , y i )
with i = 1 , 2 , , n regarding the explanatory variables ( x 1 , x 2 , , x p ) and the response variable y. If both the unknown parameter vector β in the uncertain regression model (3) and the standard deviation σ of the uncertain disturbance term ε take their theoretical true values, then based on the model structure, we can derive that
y f ( x 1 , x 2 , , x p | β ) N ( 0 , σ ) .
After standardization transformation, the above formula becomes
y f ( x 1 , x 2 , , x p | β ) σ N ( 0 , 1 ) .
Substituting actual observed data ( x i 1 , x i 2 , , x i p , y i ) , i = 1 , 2 , , n into this expression, we can define a set of real-valued functions for the parameters β and σ as
h i ( β , σ ) = y i f ( x i 1 , x i 2 , , x i p | β ) σ , i = 1 , 2 , , n .
The above n functions h 1 ( β , σ ) , h 2 ( β , σ ) , , h n ( β , σ ) constructed in this way can be regarded as a set of concrete implementations of the standardized uncertain variables. According to the fundamental principles of uncertainty theory, this set of function values should follow a standard uncertain normal distribution N ( 0 , 1 ) , that is, we can reasonably infer that
h 1 ( β , σ ) , h 2 ( β , σ ) , , h n ( β , σ ) N ( 0 , 1 )
when the unknown parameters β and σ take their theoretical true values. Thus, the problem of estimating the unknown parameters β and σ in the uncertain regression model (3) is transformed into finding the optimal parameter values such that h 1 ( β , σ ) , h 2 ( β , σ ) , , h n ( β , σ ) can be regarded as a set of observed values from the population distribution N ( 0 , 1 ) as much as possible. Here, the population distribution N ( 0 , 1 ) is the statistical invariant constructed in uncertain regression analysis, and then the corresponding parameter estimation problem in uncertain regression analysis is transformed into a parameter estimation problem based on this statistical invariant. Because the uncertainty distribution of the statistical invariant N ( 0 , 1 ) is symmetric about the zero point, it is also called the symmetric statistical invariant.
The following examples will present the symmetric statistical invariants of the specific uncertain regression models corresponding to observation sequences.
Example 1. 
Uncertain linear regression model is a classic and widely used statistical analysis method, whose core objective is to characterize the relationship between the response variable and one or more explanatory variables under uncertain environments through a linear function. Specifically, assuming we have a response variable y and p explanatory variables x 1 , x 2 , , x p , the uncertain linear regression model can be expressed as
y = β 0 + β 1 x 1 + β 2 x 2 + + β p x p + ε ,
where β 0 is the intercept term, and β 1 , β 2 , , β p are the regression coefficients corresponding to each independent variable, reflecting the marginal effect of each explanatory variable on the response variable, and the variable ε is the uncertain error term, which is usually assumed to follow a normal uncertainty distribution with zero mean and constant variance σ 2 , in order to capture uncertain variations that the model cannot explain.
Suppose also that there is a set of observation sequences
( x i 1 , x i 2 , , x i p , y i )
with i = 1 , 2 , , n regarding the explanatory variables ( x 1 , x 2 , , x p ) and the response variable y. Then it follows from (4) that the statistical invariants of uncertain linear regression model (6) corresponding to observation sequences (7) are
h i ( β 0 , β 1 , , β p , σ ) = y i β 0 β 1 x i 1 β 2 x i 2 β p x i p σ = y i β 0 l = 1 p β l x i l σ , i = 1 , 2 , , n .
Example 2. 
Uncertain exponential growth model is a mathematical framework used to characterize the phenomenon where the growth rate is proportional to the current value under uncertain environments, characterized by slow initial growth followed by rapid acceleration. This dynamic process is commonly seen in biological population growth, financial investment appreciation, and diffusion processes. Specifically, assuming we have a response variable y and a explanatory variable x, the uncertain exponential growth model can be expressed as
y = β 0 + β 1 exp ( β 2 x ) + ε , β 1 > 0 , β 2 > 0 ,
where β 0 denotes the initial reference level or lower limit, β 1 represents the initial size or scaling factor when x = 0 , while β 2 is the growth rate parameter, which determines the speed of process expansion, and the variable ε is the uncertain error term, which is usually assumed to follow a normal uncertainty distribution with zero mean and constant variance σ 2 , in order to capture uncertain variations that the model cannot explain.
Suppose also that there is a set of observation sequences
( x i , y i )
with i = 1 , 2 , , n regarding the explanatory variable x and the response variable y. Then it follows from (4) that the statistical invariants of uncertain exponential growth model (8) corresponding to observation sequences (9) are
h i ( β 0 , β 1 , β 2 , σ ) = y i β 0 β 1 exp ( β 2 x i ) σ , i = 1 , 2 , , n .
Example 3. 
Uncertain logistic decay model is a mathematical model used to describe the dynamic characteristics of the decay process under uncertain environments, which typically unfolds in three stages: initial decay with slow acceleration, a significant increase in the decay rate in the middle stage, and a gradual leveling off in the later stage. This S-shaped decay curve is commonly seen in natural and social phenomena such as biodegradation processes, chemical decomposition, and increasing market saturation. Specifically, assuming we have a response variable y and a explanatory variable x, the uncertain logistic decay model can be expressed as
y = β 0 + β 1 1 + β 2 exp ( β 3 x ) + ε ,
where β 0 denotes the initial reference level, β 1 represents the initial value or maximum possible value of the decay process, β 2 controls the horizontal position of the curve, while β 3 determines the decay rate, and the variable ε is the uncertain error term, which is usually assumed to follow a normal uncertainty distribution with zero mean and constant variance σ 2 , in order to capture uncertain variations that the model cannot explain.
Suppose also that there is a set of observation sequences
( x i , y i )
with i = 1 , 2 , , n regarding the explanatory variable x and the response variable y. Then it follows from (4) that the statistical invariants of uncertain exponential growth model (10) corresponding to observation sequences (11) are
h i ( β 0 , β 1 , β 2 , β 3 , σ ) = y i β 0 β 1 1 + β 2 exp ( β 3 x i ) σ , i = 1 , 2 , , n .

4. Least Absolute Deviation Estimation of Uncertain Regression Model

This section will systematically study the parameter estimation problem in uncertain regression analysis based on the least absolute deviation criterion, focusing on how to determine the least absolute deviation estimation of the unknown parameter vector and disturbance term in the model within the framework of uncertainty theory, and thus establish a new parameter estimation method in uncertain regression analysis.
Based on the above analysis, the least absolute deviation problem of the unknown parameters and the uncertain disturbance term in the uncertain regression model (3) is transformed into the least absolute deviation estimation problem of the uncertainty distribution function with h 1 ( β , σ ) , h 2 ( β , σ ) , , h n ( β , σ ) as the observed values data and N ( 0 , 1 ) as the population distribution.
It should be noted that the empirical distribution function constructed from the function values h 1 ( β , σ ) , h 2 ( β , σ ) , , h n ( β , σ ) can be expressed as
i = 1 n I ( h i ( β , σ ) ( , x ] ) n .
Moreover, the theoretical distribution function of the standard normal uncertainty distribution N ( 0 , 1 ) has the following analytical form,
1 exp π x 3 + 1 .
Then based on the basic criterion of least absolute deviation estimation and optimization problem (1), the estimated values of parameters β and σ can be obtained by solving the following optimization problem,
min β , σ > 0 i = 1 n j = 1 n I ( h j ( β , σ ) ( , h i ( β , σ ) ] ) n 1 exp π h i ( β , σ ) 3 + 1 .
Here the solutions β ^ and σ ^ to the optimization problem (12) is also referred to as the least absolute deviation estimation of the uncertain regression model (3).
The following examples will present the least absolute deviation estimations of the specific uncertain regression models.
Example 4. 
According to optimization problem (12), the least absolute deviation estimations of β 0 , β 1 , , β p and σ in Example 1 solve the following optimization problem
min β 0 , β 1 , , β p , σ > 0 i = 1 n j = 1 n I y j l = 1 p β l x j l y i l = 1 p β l x i l n exp π y i β 0 l = 1 p β l x i l 3 σ + 1 1 .
Example 5. 
According to optimization problem (12), the least absolute deviation estimations of β 0 , β 1 , β 2 and σ in Example 2 solve the following optimization problem
min β 0 , β 1 , β 2 , σ > 0 i = 1 n j = 1 n I y j β 1 exp ( β 2 x j ) y i β 1 exp ( β 2 x i ) n exp π y i β 0 β 1 exp ( β 2 x i ) 3 σ + 1 1 .
Example 6. 
According to optimization problem (12), the least absolute deviation estimations of β 0 , β 1 , β 2 , β 3 and σ in Example 3 solve the following optimization problem
min β 0 , β 1 , β 2 , β 3 , σ > 0 i = 1 n j = 1 n I y j β 1 1 + β 2 exp ( β 3 x j ) y i β 1 1 + β 2 exp ( β 3 x i ) n exp π y i β 0 β 1 1 + β 2 exp ( β 3 x i ) 3 σ + 1 1 .

5. Numerical Examples

In this section, we will provide two numerical examples to illustrate the method of least absolute deviation estimation of the specific uncertain regression model, and compare it with existing methods to demonstrate the effectiveness of the method proposed in this paper.
Example 7. 
Consider an uncertain exponential growth model, mathematically expressed as
y = β 0 + β 1 exp β 2 x + ε .
In this model, β 0 , β 1 > 0 , and β 2 > 0 are parameters to be estimated, and ε is an uncertain disturbance term, assumed to follow a normal uncertainty distribution with an expected value of 0 and a standard deviation of σ. For empirical analysis, we consider a dataset containing 50 sets of observations, presented in Table 1 and Figure 1, reflecting the observed relationship between the independent variable x and the response variable y.
Table 1. Dataset containing 50 sets of observations of uncertain exponential growth model (16) in Example 7.
Figure 1. Dataset containing 50 sets of observations of uncertain exponential growth model (16) in Example 7.
Denote the dataset by ( x i , y i ) with i = 1 , 2 , , 50 . Then it follows from Example 2 that the statistical invariants of uncertain exponential growth model (16) corresponding to dataset ( x i , y i ) , i = 1 , 2 , , 50 are
h i ( β 0 , β 1 , β 2 , σ ) = y i β 0 β 1 exp ( β 2 x i ) σ , i = 1 , 2 , , 50 .
By using the conclusion of Example 5, we can infer that the least absolute deviation estimations of β 0 , β 1 , β 2 and σ are the optimal solutions of the following optimization problem,
min β 0 , β 1 , β 2 , σ > 0 i = 1 50 j = 1 50 I y j β 1 exp ( β 2 x j ) y i β 1 exp ( β 2 x i ) 50 exp π y i β 0 β 1 exp ( β 2 x i ) 3 σ + 1 1 ,
which can be obtained as
β ^ 0 = 1.9804 , β ^ 1 = 1.0734 , β ^ 2 = 0.8018 , σ ^ = 1.0131 .
Based on this, a fitted uncertain exponential growth model can be obtained as
y = 1.9804 + 1.0734 exp 0.8018 x + N ( 0 , 1.0131 ) .
To test the suitability of the fitted uncertain exponential growth model (17) for the dataset ( x i , y i ) , i = 1 , 2 , , 50 , the residuals of the estimated model corresponding to the dataset can be calculated by means of
ε i = y i β ^ 0 β ^ 1 exp β ^ 2 x i , i = 1 , 2 , , 50 ,
which are showed in Table 2 and Figure 2. Set the significance level as α = 0.05 , it follows from the uncertain hypothesis testing proposed by Ye and Liu [27] that the corresponding standardized critical value is
± 3 σ ^ π log α 2 α = ± 2.0464 ,
and the test is
W = { ( ε 1 , ε 2 , , ε 50 ) : there   are   at   least   3   of   indexes   i s   with   1 i 50 such   that   ε i < 2.0464   or   ε i > 2.0464 } .
As shown in Table 2 and Figure 2, only ε 45 is outside the interval [ 2.0464 , 2.0464 ] , which does not meet the condition that the number of outliers with | ε i | > 2.0464 is at least 3. Thus
( ε 1 , ε 2 , , ε 50 ) W ,
therefore failing to reject the compatibility assumption of the estimated model at significance level of α = 0.05 . This result indicates that the fitted uncertain exponential growth model (17) can well adapt to and fit the dataset ( x i , y i ) , i = 1 , 2 , , 50 .
Table 2. Residuals of the fitted uncertain exponential growth model (17) corresponding to the dataset ( x i , y i ) , i = 1 , 2 , , 50 .
Figure 2. Residuals of the fitted uncertain exponential growth model (17) corresponding to the dataset ( x i , y i ) , i = 1 , 2 , , 50 .
Finally, based on the dataset ( x i , y i ) , i = 1 , 2 , , 50 in Table 1 and Figure 1, we will combine the existing parameter estimation methods for uncertain regression models, such as the least squares estimation method studied by Wang et al. [28] and the uncertain maximum likelihood estimation method explored by Liu and Qin [21], to perform statistical inference on the uncertain exponential growth model (16), and obtain
y = 0.1011 + 2.0153 exp 0.6578 x + N ( 0 , 1.3340 )
estimated based on the least squares estimation method and
y = 2.1632 + 0.8277 exp 0.8822 x + N ( 0 , 1.0456 )
estimated based on the uncertain maximum likelihood estimation method, respectively. To further compare the performance of the models, the mean squared error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) of models (17)–(19) on the same dataset ( x i , y i ) , i = 1 , 2 , , 50 are calculated, and the corresponding calculation formulas are as follows,
MSE = 1 50 i = 1 50 β ^ 0 + β ^ 1 exp β ^ 2 x i y i 2 ,
MAE = 1 50 i = 1 50 | β ^ 0 + β ^ 1 exp β ^ 2 x i y i |
and
MAPE = 1 50 i = 1 50 β ^ 0 + β ^ 1 exp β ^ 2 x i y i y i ,
respectively. Table 3 lists the comparison results of the error indices for these three estimated models. The result in the Table 3 shows that the MSE, MAE and MAPE obtained based on model (17) are all lower than those corresponding to models (18) and (19). This difference indicates that the least absolute deviation estimation method proposed in this paper exhibits superior fitting and prediction performance in parameter estimation of uncertain regression models.
Table 3. The MSEs, MAEs and MAPEs of estimated uncertain exponential growth models (17)–(19).
Example 8. 
Consider an uncertain logistic decay model, mathematically expressed as
y = β 0 + β 1 1 + β 2 exp ( β 3 x ) + ε .
In this model, β 0 , β 1 , β 2 , and β 3 are parameters to be estimated, and ε is an uncertain disturbance term, assumed to follow a normal uncertainty distribution with an expected value of 0 and a standard deviation of σ. For empirical analysis, we consider a dataset containing 60 sets of observations, presented in Table 4 and Figure 3, reflecting the observed relationship between the independent variable x and the response variable y.
Table 4. Dataset containing 60 sets of observations of uncertain logistic decay model (20) in Example 8.
Figure 3. Dataset containing 60 sets of observations of uncertain logistic decay model (20) in Example 8.
Denote the dataset by ( x i , y i ) with i = 1 , 2 , , 60 . Then it follows from Example 3 that the statistical invariants of uncertain logistic decay model (20) corresponding to dataset ( x i , y i ) , i = 1 , 2 , , 60 are
h i ( β 0 , β 1 , β 2 , β 3 , σ ) = y i β 0 β 1 1 + β 2 exp ( β 3 x i ) σ , i = 1 , 2 , , 60 .
By using the conclusion of Example 6, we can infer that the least absolute deviation estimations of β 0 , β 1 , β 2 , β 3 and σ are the optimal solutions of the following optimization problem,
min β 0 , β 1 , β 2 , β 3 , σ > 0 i = 1 60 j = 1 60 I y j β 1 1 + β 2 exp ( β 3 x j ) y i β 1 1 + β 2 exp ( β 3 x i ) 60 exp π y i β 0 β 1 1 + β 2 exp ( β 3 x i ) 3 σ + 1 1 ,
which can be obtained as
β ^ 0 = 5.1207 , β ^ 1 = 0.0218 , β ^ 2 = 2.4583 , β ^ 3 = 32.6027 , σ ^ = 0.3794 .
Based on this, a fitted uncertain logistic decay model can be obtained as
y = 5.1207 + 0.0218 1 2.4583 exp ( 32.6027 x ) + N ( 0 , 0.3794 )
To test the suitability of the fitted uncertain logistic decay model (21) for the dataset ( x i , y i ) , i = 1 , 2 , , 60 , the residuals of the estimated model corresponding to the dataset can be calculated by means of
ε i = y i β ^ 0 β ^ 1 1 + β ^ 2 exp ( β ^ 3 x i ) , i = 1 , 2 , , 60 ,
which are showed in Table 5 and Figure 4. Set the significance level as α = 0.1 , it follows from the uncertain hypothesis testing proposed by Ye and Liu [27] that the corresponding standardized critical value is
± 3 σ ^ π log α 2 α = ± 0.6159 ,
and the test is
W = { ( ε 1 , ε 2 , , ε 60 ) : there   are   at   least   7   of   indexes   i   s with 1 i 60 such   that   ε i < 0.6159   or   ε i > 0.6159 } .
As shown in Table 5 and Figure 4, only ε 1 , ε 2 , ε 3 , ε 34 , ε 37 are outside the interval [ 0.6159 , 0.6159 ] , which does not meet the condition that the number of outliers with | ε i | > 0.6159 is at least 7. Thus
( ε 1 , ε 2 , , ε 60 ) W ,
therefore failing to reject the compatibility assumption of the estimated model at significance level of α = 0.1 . This result indicates that the fitted uncertain logistic decay model (21) can well adapt to and fit the dataset ( x i , y i ) , i = 1 , 2 , , 60 .
Table 5. Residuals of the fitted uncertain logistic decay model (21) corresponding to the dataset ( x i , y i ) , i = 1 , 2 , , 60 .
Figure 4. Residuals of the fitted uncertain logistic decay model (21) corresponding to the dataset ( x i , y i ) , i = 1 , 2 , , 60 .
Finally, based on the dataset ( x i , y i ) , i = 1 , 2 , , 60 in Table 4 and Figure 3, we will combine the existing parameter estimation methods for uncertain regression models, such as the least squares estimation method studied by Wang et al. [28] and the uncertain maximum likelihood estimation method explored by Liu and Qin [21], to perform statistical inference on the uncertain exponential growth model (16), and obtain
y = 99.1773 + 24.9373 1 0.1450 exp ( 32.9138 x ) + N ( 0 , 1.0417 )
estimated based on the least squares estimation method and
y = 4.9204 + 17.5329 1 + 156.3587 exp ( 11.2862 x ) + N ( 0 , 7.6665 )
estimated based on the uncertain maximum likelihood estimation method, respectively. To further compare the performance of the models, the mean squared error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) of models (21)–(23) on the same dataset ( x i , y i ) , i = 1 , 2 , , 60 are calculated, and the corresponding calculation formulas are as follows,
MSE = 1 60 i = 1 60 β ^ 0 + β ^ 1 1 + β ^ 2 exp ( β ^ 3 x i ) y i 2 ,
MAE = 1 60 i = 1 60 β ^ 0 + β ^ 1 1 + β ^ 2 exp ( β ^ 3 x i ) y i
and
MAPE = 1 60 i = 1 60 β ^ 0 + β ^ 1 1 + β ^ 2 exp ( β ^ 3 x i ) y i y i ,
respectively. Table 6 lists the comparison results of the error indices for these three estimated models. The result in the Table 6 shows that the MSE, MAE and MAPE obtained based on model (21) are all lower than those corresponding to models (22) and (23). This difference indicates that the least absolute deviation estimation method proposed in this paper exhibits superior fitting and prediction performance in parameter estimation of uncertain regression models.
Table 6. The MSEs, MAEs and MAPEs of estimated uncertain logistic decay models (21)–(23).

6. Application in Sport Statistics

In sports statistics, scientifically quantifying the complex relationship between athletes’ physiological indicators, athletic performance, and economic value is crucial for training optimization, talent selection, and market evaluation. However, sports data is often affected by measurement errors, individual differences, and outliers. Traditional regression methods are sensitive to outliers, potentially weakening the model’s robustness and explanatory power. Therefore, this section introduces uncertain regression model and least absolute deviation criterion method to improve the stability and reliability of statistical inference under non-ideal data environments in sports statistics.
Specifically, this section focuses on two typical sports statistics scenarios: First, modeling the correlation between athletes’ physiological indicators and athletic performance. We select the weight, speed and agility performance, and flexibility and strength performance of adolescent athletes as explanatory variables to explore their relationship with lung capacity. In this scenario, individual physiological compensation and fluctuations in test conditions can easily lead to observational anomalies. Uncertain regression models and least absolute deviation criterion can effectively suppress the interference of outliers on model parameters, revealing the intrinsic correlations more robustly. Second, predicting the market value of FIFA football players. Player market value is influenced by multiple uncertain variables, including age, overall rating, international reputation, weak foot, skill moves, and often includes a few high-leverage observed values (such as superstar players). Traditional regression methods are easily affected by extreme values, leading to systematic biases in prediction. This section constructs an uncertain regression model based on the least absolute deviation criterion to enhance the inclusiveness of unconventional observed values and improve the robustness and practical reference value of market value prediction. Through empirical analysis of the two scenarios described above, this section systematically verifies the advantages of least absolute deviation estimation in sports statistics, providing a more robust and stable analytical method for modeling complex sports data, and promoting the cross-integration of statistical methods and sports science.

6.1. Uncertain Lung Capacity Model

To investigate the relationship between weight, speed and agility performance, flexibility and strength performance and lung capacity of young athletes, Li [29] collected relevant data from 18 young athletes, as shown in Table 7.
Table 7. Dataset of the weight, speed and agility performance, flexibility and strength performance and lung capacity of 18 young athletes.
Let the lung capacity of the adolescent athletes be the response variable y, and let the weight ( x 1 ), speed and agility performance ( x 2 ), and flexibility and strength performance ( x 3 ) of the adolescent athletes be the explanatory variables. And represent the dataset in Table 7 as
( y i , x i 1 , x i 2 , x i 3 ) , i = 1 , 2 , , 18 .
Then we use the following uncertain linear regression model,
y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ε ,
to fit the dataset ( y i , x i 1 , x i 2 , x i 3 ) , i = 1 , 2 , , 18 . In this model, β 0 , β 1 , β 2 , and β 3 are parameters to be estimated, and ε is an uncertain disturbance term, assumed to follow a normal uncertainty distribution with an expected value of 0 and a standard deviation of σ .
It follows from Example 1 that the statistical invariants of uncertain linear regression model (24) corresponding to dataset ( y i , x i 1 , x i 2 , x i 3 ) , i = 1 , 2 , , 18 are
h i ( β 0 , β 1 , β 2 , β 3 , σ ) = y i β 0 β 1 x 1 β 2 x 2 β 3 x 3 σ , i = 1 , 2 , , 18 .
By using the conclusion of Example 4, we can infer that the least absolute deviation estimations of β 0 , β 1 , β 2 , β 3 and σ are the optimal solutions of the following optimization problem,
min β 0 , β 1 , β 2 , β 3 , σ > 0 i = 1 18 j = 1 18 I y j l = 1 3 β l x j l y i l = 1 3 β l x i l 18 exp π y i β 0 l = 1 3 β l x i l 3 σ + 1 1 ,
which can be obtained as
β ^ 0 = 1472.9510 , β ^ 1 = 5.3044 , β ^ 2 = 5.5439 , β ^ 3 = 17.5744 , σ ^ = 25.4603 .
Based on this, a fitted uncertain lung capacity model can be obtained as
y = 1472.9510 + 5.3044 x 1 5.5439 x 2 + 17.5744 x 3 + N ( 0 , 25.4603 ) .
To test the suitability of the fitted uncertain lung capacity model (25) for the dataset ( y i , x i 1 , x i 2 , x i 3 ) , i = 1 , 2 , , 18 , the residuals of the estimated model corresponding to the dataset can be calculated by means of
ε i = y i β ^ 0 β ^ 1 x i 1 β ^ 2 x i 2 β ^ 3 x i 3 , i = 1 , 2 , , 18 ,
which are showed in Table 8 and Figure 5. Set the significance level as α = 0.1 , it follows from the uncertain hypothesis testing proposed by Ye and Liu [27] that the corresponding standardized critical value is
± 3 σ ^ π log α 2 α = ± 41.3310 ,
and the test is
W = { ( ε 1 , ε 2 , , ε 18 ) : there   are   at   least   2   of   indexes   i s   with   1 i 18 such   that   ε i < 41.3310   or   ε i > 41.3310 } .
As shown in Table 8 and Figure 5, only ε 15 is outside the interval [ 41.3310 , 41.3310 ] , which does not meet the condition that the number of outliers with | ε i | > 41.3310 is at least 2. Thus
( ε 1 , ε 2 , , ε 18 ) W ,
therefore failing to reject the compatibility assumption of the estimated model at significance level of α = 0.1 . This result indicates that the fitted uncertain lung capacity model (25) can well adapt to and fit the dataset ( y i , x i 1 , x i 2 , x i 3 ) , i = 1 , 2 , , 18 .
Table 8. Residuals of the fitted uncertain lung capacity model (25) corresponding to the dataset ( y i , x i 1 , x i 2 , x i 3 ) , i = 1 , 2 , , 18 .
Figure 5. Residuals of the fitted uncertain lung capacity model (25) corresponding to the dataset ( y i , x i 1 , x i 2 , x i 3 ) , i = 1 , 2 , , 18 .

6.2. Uncertain FIFA Football Player Valuation Model

To investigate the relationship between age, overall rating, international reputation, weak foot, skill moves and the player market value, we collected relevant data from 28 football players retrieved from Website [30], as shown in Table 9.
Table 9. Dataset of the age, overall rating, international reputation, weak foot, skill moves and the player market value of 28 football players.
Let the player market value of the football players be the response variable y, and let the age ( x 1 ), overall rating ( x 2 ), international reputation ( x 3 ), weak foot ( x 4 ), and skill moves ( x 5 ) of the football playerss be the explanatory variables. And represent the dataset in Table 9 as
( y i , x i 1 , x i 2 , x i 3 , x i 4 , x i 5 ) , i = 1 , 2 , , 28 .
Then we use the following uncertain linear regression model,
y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 + β 5 x 5 + ε ,
to fit the dataset ( y i , x i 1 , x i 2 , x i 3 , x i 4 , x i 5 ) , i = 1 , 2 , , 28 . In this model, β 0 , β 1 , β 2 , β 3 , β 4 , and β 5 are parameters to be estimated, and ε is an uncertain disturbance term, assumed to follow a normal uncertainty distribution with an expected value of 0 and a standard deviation of σ .
It follows from Example 1 that the statistical invariants of uncertain linear regression model (26) corresponding to dataset ( y i , x i 1 , x i 2 , x i 3 , x i 4 , x i 5 ) , i = 1 , 2 , , 28 are
h i ( β 0 , β 1 , β 2 , β 3 , β 4 , β 5 , σ ) = y i β 0 β 1 x 1 β 2 x 2 β 3 x 3 β 4 x 4 β 5 x 5 σ
with i = 1 , 2 , , 28 . By using the conclusion of Example 4, we can infer that the least absolute deviation estimations of β 0 , β 1 , β 2 , β 3 , β 4 , β 5 and σ are the optimal solutions of the following optimization problem,
min β 0 , β 1 , β 2 , β 3 , β 4 , β 5 , σ > 0 i = 1 28 j = 1 28 I y j l = 1 5 β l x j l y i l = 1 5 β l x i l 28 exp π y i β 0 l = 1 5 β l x i l 3 σ + 1 1 ,
which can be obtained as
β ^ 0 = 380.3405 , β ^ 1 = 0.7326 , β ^ 2 = 5.0971 , β ^ 3 = 0.0499 , β ^ 4 = 1.7853 , β ^ 5 = 2.3416 ,
and
σ ^ = 7.4316 .
Based on this, a fitted uncertain FIFA football player valuation model can be obtained as
y = 380.3405 0.7326 x 1 + 5.0971 x 2 0.0499 x 3 1.7853 x 4 + 2.3416 x 5 + N ( 0 , 7.4316 ) .
To test the suitability of the fitted uncertain FIFA football player valuation model (27) for the dataset ( y i , x i 1 , x i 2 , x i 3 , x i 4 , x i 5 ) , i = 1 , 2 , , 28 , the residuals of the estimated model corresponding to the dataset can be calculated by means of
ε i = y i β ^ 0 β ^ 1 x i 1 β ^ 2 x i 2 β ^ 3 x i 3 β ^ 4 x i 4 β ^ 5 x i 5 , i = 1 , 2 , , 28 ,
which are showed in Table 10 and Figure 6. Set the significance level as α = 0.05 , it follows from the uncertain hypothesis testing proposed by Ye and Liu [27] that the corresponding standardized critical value is
± 3 σ ^ π log α 2 α = ± 15.0106 ,
and the test is
W = { ( ε 1 , ε 2 , , ε 28 ) : there   are   at   least   2   of   indexes   i s   with   1 i 28 such   that   ε i < 15.0106   or   ε i > 15.0106 } .
As shown in Table 10 and Figure 6, only ε 5 is outside the interval [ 15.0106 , 15.0106 ] , which does not meet the condition that the number of outliers with | ε i | > 15.0106 is at least 2. Thus
( ε 1 , ε 2 , , ε 28 ) W ,
therefore failing to reject the compatibility assumption of the estimated model at significance level of α = 0.05 . This result indicates that the fitted uncertain FIFA football player valuation model (27) can well adapt to and fit the dataset ( y i , x i 1 , x i 2 , x i 3 , x i 4 , x i 5 ) , i = 1 , 2 , , 28 .
Table 10. Residuals of the fitted uncertain FIFA football player valuation model (27) corresponding to the dataset ( y i , x i 1 , x i 2 , x i 3 , x i 4 , x i 5 ) , i = 1 , 2 , , 28 .
Figure 6. Residuals of the fitted uncertain FIFA football player valuation model (27) corresponding to the dataset ( y i , x i 1 , x i 2 , x i 3 , x i 4 , x i 5 ) , i = 1 , 2 , , 28 .

7. Conclusions

For the purpose of enriching the statistical inference study in uncertain regression analysis, a symmetric statistical invariant for the uncertain regression model based on observed data and uncertain disturbance terms was constructed in this paper. Based on the constructed statistical invariant, the least absolute deviation criterion was also applied to propose the least absolute deviation estimation for the uncertain regression model. Following that, the least absolute deviation estimation of the uncertain regression model was applied to the uncertain linear regression model, uncertain exponential growth model, and uncertain logistic decay model, respectively, and the advantages of the proposed method were also illustrated with two numerical examples by comparing with existing methods. Finally, the proposed method was also applied to two typical scenarios in sports statistics, and the corresponding uncertain statistical models were studied based on real data to illustrate the application effect.
In addition, future research directions could focus on the numerical solution algorithms for the corresponding least absolute deviation estimations, error analysis, and application research in typical cognitive uncertainty scenarios such as social statistics and psychostatistics.

Funding

This work was supported by the National Team Science and Technology Support Youth Project of the General Administration of Sport of China (No. 24QN022).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest that relate to the research described in this paper. Neither the entire paper nor any part of its content has been published or has been accepted elsewhere. It is also not being submitted to any other journal.

References

  1. Galton, F. Family likeness in stature. Proc. R. Soc. Lond. 1886, 40, 42–73. [Google Scholar] [CrossRef]
  2. Draper, N.R.; Smith, H. Applied Regression Analysis, 2nd ed.; John Wiley and Sons: New York, NY, USA, 1981. [Google Scholar]
  3. Berkson, J. Application of the logistic function to bio-assay. J. Am. Stat. Assoc. 1944, 39, 357–365. [Google Scholar] [PubMed]
  4. Hoerl, A.; Kennard, R. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  5. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  6. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
  7. Freund, R.; Wilson, W.; Sa, P. Regression Analysis; Elsevier: Amsterdam, The Netherlands, 2006. [Google Scholar]
  8. Sen, A.; Srivastava, M. Regression Analysis: Theory, Methods and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  9. Chatterjee, S.; Simonoff, J. Handbook of Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  10. Jiang, B.; Ye, T. Uncertain panel regression analysis with application to the impact of urbanization on electricity intensity. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 13017–13029. [Google Scholar] [CrossRef]
  11. Liu, Y. Moment estimation for uncertain regression model with application to factors analysis of grain yield. Commun. Stat. Simul. Comput. 2024, 53, 4936–4946. [Google Scholar] [CrossRef]
  12. Liu, B. Uncertainty Theory, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  13. Liu, B. Some research problems in uncertainty theory. J. Uncertain Syst. 2009, 3, 3–10. [Google Scholar]
  14. Yao, K.; Liu, B. Uncertain regression analysis: An approach for imprecise observations. Soft Comput. 2018, 22, 5579–5582. [Google Scholar] [CrossRef]
  15. Ye, T.; Liu, Y.H. Multivariate uncertain regression model with imprecise observations. J. Ambient. Intell. Humaniz. Comput. 2020, 11, 4941–4950. [Google Scholar] [CrossRef]
  16. Chen, D. Uncertain regression model with autoregressive time series errors. Soft Comput. 2021, 25, 14549–14559. [Google Scholar] [CrossRef]
  17. Chen, D. Uncertain regression model with moving average time series errors. Commun. Stat. Theory Methods 2023, 52, 7632–7646. [Google Scholar] [CrossRef]
  18. Ding, J.; Zhang, Z. Statistical inference on uncertain nonparametric regression model. Fuzzy Optim. Decis. Mak. 2021, 20, 451–469. [Google Scholar] [CrossRef]
  19. Liu, Z.; Yang, Y. Least absolute deviations estimation for uncertain regression with imprecise observations. Fuzzy Optim. Decis. Mak. 2020, 19, 33–52. [Google Scholar] [CrossRef]
  20. Lio, W.; Liu, B. Uncertain maximum likelihood estimation with application to uncertain regression analysis. Soft Comput. 2020, 24, 9351–9360. [Google Scholar] [CrossRef]
  21. Liu, Y.; Qin, Z. Modified maximum likelihood approach in uncertain regression analysis and application to factors analysis of urban air quality. Math. Comput. Simul. 2025, 234, 219–234. [Google Scholar] [CrossRef]
  22. Chen, D. Tukey’s biweight estimation for uncertain regression model with imprecise observations. Soft Comput. 2020, 24, 16803–16809. [Google Scholar] [CrossRef]
  23. Lio, W.; Liu, B. Residual and confidence interval for uncertain regression model with imprecise observations. J. Intell. Fuzzy Syst. 2018, 35, 2573–2583. [Google Scholar] [CrossRef]
  24. Liu, Y.; Liu, B. A modified uncertain maximum likelihood estimation with applications in uncertain statistics. Commun. Stat. Theory Methods 2024, 53, 6649–6670. [Google Scholar]
  25. Liu, Y.; Liu, B. Estimation of uncertainty distribution function by the principle of least squares. Commun. Stat. Theory Methods 2024, 53, 7624–7641. [Google Scholar] [CrossRef]
  26. Ning, S.; Liu, Y. Estimation of unknown parameters in uncertainty distribution via the least absolute deviation principle and its application in uncertain statistics. Commun. Stat. Simul. Comput. 2025. accepted. [Google Scholar]
  27. Ye, T.; Liu, B. Uncertain hypothesis test with application to uncertain regression analysis. Fuzzy Optim. Decis. Mak. 2022, 21, 157–174. [Google Scholar] [CrossRef]
  28. Wang, H.; Liu, Y.; Shi, H. Estimating unknown parameters and disturbance term in uncertain regression models by the principle of least squares. Symmetry 2024, 16, 1182. [Google Scholar] [CrossRef]
  29. Li, C. The application of Excel multivariate linear regression in sports statistics. China Manag. Inform. 2011, 14, 65–66. [Google Scholar]
  30. FIFA23 Official Dataset. 2025. Available online: https://www.kaggle.com/datasets/bryanb/fifa-player-stats-database (accessed on 11 September 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.