Influence Analysis in the Lognormal Regression Model with Fitted and Quantile Residuals

Habib, Muhammad; Amin, Muhammad; Aljeddani, Sadiah M. A.

doi:10.3390/axioms14060464

Open AccessArticle

Influence Analysis in the Lognormal Regression Model with Fitted and Quantile Residuals

by

Muhammad Habib

^1,*

,

Muhammad Amin

¹

and

Sadiah M. A. Aljeddani

²

¹

Department of Statistics, University of Sargodha, Sargodha 40100, Pakistan

²

Mathematics Department, Al-Lith University College, Umm Al-Qura University, Al-Lith 21961, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(6), 464; https://doi.org/10.3390/axioms14060464

Submission received: 14 April 2025 / Revised: 9 June 2025 / Accepted: 11 June 2025 / Published: 13 June 2025

(This article belongs to the Special Issue Advances in the Theory and Applications of Statistical Distributions)

Download

Browse Figures

Versions Notes

Abstract

Influence analysis is a critical diagnostic tool in regression modeling to ensure reliable parameter estimates. This study evaluates the effectiveness of diagnostic methods for detecting influential observations in the lognormal regression model using fitted and quantile residuals. We assess Cook’s distance, modified Cook’s distance, covariance ratio, and the Hadi method through a Monte Carlo simulation with varying sample sizes, dispersion parameters, perturbation values, and numbers of explanatory variables, and a real-world application to an atmospheric environmental dataset. Simulation results demonstrate that Cook’s distance and the Hadi method achieve a good performance under all scenarios, with quantile residuals generally outperforming fitted residuals. The sensitivity analysis confirms their robustness, with minimal variation in detection rates. The covariance ratio performs well but shows slight variability in high-dispersion cases, while modified Cook’s distance consistently underperforms, particularly with quantile residuals. The real-world application confirms these findings, with Cook’s distance and the Hadi method effectively identifying influential points affecting ozone concentration estimates. These results highlight the superiority of Cook’s distance and the Hadi method for lognormal regression model diagnostics, with quantile residuals enhancing detection accuracy.

Keywords:

influential observation; lognormal regression model; cook’s distance; modified Cook’s distance; covariance ratio; Hadi method; fitted residual; quantile residual

MSC:

62J02; 62J20

1. Introduction

Regression analysis is a collection of statistical procedures used for investigating and modeling the response variables based on covariates. The main objectives of regression are to predict the response variable and to estimate unknown regression parameters. The evolution of regression analysis has extended gone far beyond its initial use in the natural sciences, becoming a mainstay in the social sciences, economics, finance and many other disciplines [1]. Different types of regression models are used to analyze the relationship between response and explanatory variables. These regression models include the linear regression model (LRM), nonlinear regression model (NLRM), generalized linear model (GLM), and nonparametric regression model (NPRM).

In regression analysis, when the response variable follows the lognormal distribution, we prefer the lognormal regression model (LNRM) over the LRM. The model is used in reliability engineering to model the lifespan of components and systems, especially when the failure times are multiplicatively related to several factors [2]. In medical research, it is used to model health outcomes based on income and environmental pollutants and to model species-diversity-based environmental factors [3]. The LRM, which assumes a normally distributed response, or the GLM, which accommodates various exponential family distributions, the LNRM is specifically designed for positively skewed, lognormally distributed responses. The skewness of the LNRM is directly related to its variance. This skewness introduces unique challenges in regression diagnostics, as influential observations can disproportionately affect parameter estimates due to the multiplicative nature of the lognormal distribution.

One of the important assumptions in regression modeling is that no observation should influence the regression estimates. In the regression model, some points have a significant impact on the regression estimates. The presence of influential observations in the dataset causes an effect on parameter estimates and their standard errors. We must detect and address these points and then refit the model for more reliable results [4]. The main objective of influence diagnostics is to determine the unique impact of influential observations on the conclusions and the fitted model [5]. Influence diagnostics are vital to identifying the specific impact of influential observations on regression model fitting [6].

Many authors have worked on different regression model diagnostics. Cook [7] was the first one who introduced the Cook’s distance method and examined the contribution of studentized residuals and the variance of the studentized residuals for influential diagnostics in the LRM. Chatterjee and Hadi [8] examined how to find outliers, high-leverage and influential observations in the LRM. Balasooriya et al. [9] evaluated various methods for identifying influential observations and outliers in linear regression through an empirical study of seven statistical methods applied to six datasets. Brown and Lawrence [10] described the regression diagnostics for the identification of influential observations in the LRM. Meloun and Militký [11] investigated the influential observations and outliers in the LRM based on different residuals and influence diagnostic procedures. Nurunnabi et al. [12] introduced diagnostic measures for influential observations in the LRM. Vanegas et al. [13] introduced diagnostic tools for Cook’s distance and metrics for determining influential observations in the generalized Weibull linear model. Jang and Andrson-Cook [14] proposed firework plots for evaluating the impact of outliers and influential observations in the GLM. Zhang [15] described residuals and regression diagnostic to focus on outliers, leverage, and influential observations in logistic regression. Amin et al. [16] evaluated the residuals for the inverse Gaussian regression to identify influential observation. Bae et al. [17] developed a new method for the influential measure in the LRM using the deletion method. Amin et al. [18] developed the adjusted deviance residuals in the gamma regression model for influence diagnostics. Imran and Akbar proposed diagnostics using partial residual plots in inverse Gaussian regression [19]. Khaleeq et al. [20] proposed influence diagnostics in the LNRM with censored data. Khan et al. [21] introduced the Poisson regression residuals for the assessment of influential observations. Amin et al. [22] introduced diagnostic techniques to detect influential points in the logistic regression model with different link functions. Camilleri et al. [23] identified the influential observations and in the multiple regression model. Soale [24] discussed the influential observations on the single index Fréchet regression model. Khan et al. [25] compared the performance of some residuals for the Cook’s distance in the beta regression model and recommended that the deviance residual-based influence diagnostics are better than other residuals.

This research is motivated by the available literature on various diagnostic techniques for the LRM and the GLM to detect influential observations. If the response variable follows the lognormal distribution with extreme observations in the explanatory variables, then we need the best diagnostics method for the detection of such observations in the LNRM. These diagnostics include Cook’s distance, the covariance ratio, modified Cook’s distance, and the Hadi method. Consequently, the LNRM requires specialized diagnostic methods, such as those leveraging fitted and quantile residuals, to effectively detect influential points, which differ from the standardized residuals commonly used in the LRM or the deviance and Pearson residuals in the GLM [4]. These tailored diagnostics are critical to ensure robust model fitting and reliable inference in the presence of extreme observations. Thus, the purpose of this study is to compare the influence diagnostic techniques for the LNRM with fitted and quantile residuals.

The rest of the study is structured as follows. In Section 2, we will discuss the methodology to estimate the LNRM and its diagnostics with residuals. In Section 3, we present numerical evaluations using a Monte Carlo simulation with different sample sizes, various dispersion, and a varying number of independent variables to examine the effectiveness of diagnostic methods. In Section 4, an example will be considered, while in Section 5, we present concluding remarks of the study.

2. Materials and Methods

In this section, we will discuss the LNRM and its estimation. We will also discuss different influential observation diagnostics methods.

2.1. The Log-Normal Regression Model

In regression analysis, when the response variable follows the lognormal distribution, we prefer the LNRM over the LRM.

Suppose

y

is a random variable and follow a lognormal distribution LN (

θ, σ^{2}

) with probability density function given as

f (y | θ, σ^{2}) = \frac{1}{\sqrt{(2 π σ^{2}) y}} e^{[- \frac{{(\ln (y) - θ)}^{2}}{2 σ^{2}}]}, y > 0, θ > 0, σ > 0

(1)

where

σ

is the scale parameter

θ

is the location parameter of the LN distribution. The mean and the variance of Equation (1) are, respectively given by

{E (y) = μ = e}^{(θ + \frac{σ^{2}}{2}) .}

(2)

v a r (y) = e^{2 θ + σ^{2}} {e^{(- σ^{2}) - 1}} .

(3)

In the LNRM structure, Equation (1) can be written as [26]

f (y | x) = \frac{1}{\sqrt{(2 π σ^{2}) y}} e^{[- \frac{{(\ln (y) - μ)}^{2}}{2 σ^{2}}]}, y > 0, μ > 0, σ > 0

(4)

where

{μ = e}^{(θ + \frac{σ^{2}}{2})}

and

θ = X β

, where X is matrix of p explanatory variables including the intercept, and

β

is a vector of regression parameters including the intercept. To fit the LNRM, we need to estimate

β

and

σ

, and for this purpose, we consider the maximum likelihood estimator (MLE). The MLE requires the likelihood and log-likelihood functions of the model.

Let

L = (θ, σ^{2})

be loglikelihood function and the likelihood function for Equation (4) can be written as

\begin{matrix} L = \prod_{i = 1}^{n} [f (y_{i}| μ, σ^{2})] . \\ = \prod_{i = 1}^{n} ({(2 π σ^{2})}^{- \frac{1}{2}} y_{i}^{- 1} e x p [\frac{{- (\ln (y_{i}) - μ)}^{2}}{2 σ^{2}}]) . \\ = {(2 π σ^{2})}^{- \frac{n}{2}} \prod_{i = 1}^{n} y_{i}^{- 1} e x p [\frac{{- (\ln (y_{i}) - μ)}^{2}}{2 σ^{2}}] . \end{matrix}

(5)

Taking the natural logarithm on both sides of Equation (5), we have

l (μ, σ^{2}) = - \frac{n}{2} l n 2 π - \frac{n}{2} l n σ^{2} + \sum l n y_{i} - \frac{1}{2 σ^{2}} \sum {(l n y_{i} - μ)}^{2},

(6)

where

μ = e^{X β + \frac{σ^{2}}{2}}

. To obtain the MLE of the LNRM, we differentiate Equation (6) w.r.t

β

\begin{matrix} \frac{\partial}{\partial β} l (μ, σ^{2}) = - \frac{2}{2 σ^{2}} \sum {(l n y_{i} - μ)}^{2} \frac{\partial}{\partial β} (l n y_{i} - e^{X β + \frac{σ^{2}}{2}}) . \\ \frac{\partial}{\partial β} l (μ, σ^{2}) = - \frac{1}{σ^{2}} {\sum (l n y_{i} - μ_{i})}^{2} (- e^{X β}) X . \end{matrix}

(7)

Put Equation (7) equals to zero, we have

\frac{\partial}{\partial β} l (μ_{i}, σ^{2}) = - \frac{1}{σ^{2}} {\sum X_{i j} (l n y_{i} - μ_{i})}^{2} e^{\sum X_{i j} β_{j}} = 0 .

(8)

Since the solution of the Equation (8) is nonlinear in parameter, so, we estimate the unknown parameter through iterative fisher scoring method [27]. Let

β^{r}

be the estimated ML values of

β

with r iterations as given by

β^{r + 1} = β^{r} + {\{I (β^{r})\}}^{- 1} S (β^{r}),

(9)

where

S (β^{r})

is the score vector of size

(p + 1) \times 1

and

I (β^{r})

is the fisher information matrix. Both fisher information matrix and score vectors are evaluated at

β^{r} .

At convergence for Equation (9), the unknown coefficients can be evaluated as

{\hat{β}}_{M L} = {(X^{'} \hat{W} X)}^{- 1} X^{'} \hat{W} z,

(10)

where

W = d i a g (\frac{1}{{\hat{σ}}_{i}}), {\hat{σ}}_{i}^{2} = r_{i}^{2}, r = y - \hat{μ}, z = \hat{θ} + \frac{y - \hat{μ}}{V (\hat{μ})}

and

V (\hat{μ})

is the estimated variance function of the model.

2.2. Log-Normal Regression Residuals

A residual in regression analysis is the difference between the observed and predicted values of a model. It represents the error or deviation of the model’s prediction from the actual data. Residuals are essential in the valuation of influential observations [28]. Residuals for the LRM can be written as

\hat{Ɛ} = y - \hat{y},

(11)

where

\hat{y}

is the predicted response of the LRM. Residuals are also used for determining the superiority of the fitted model. In the LRM, only one residual is available to evaluate its assumptions and model validity. In contrast, for the GLM, different types of residuals are available including Pearson residual, deviance, working, Anscombe and likelihood residuals and others. In the LNRM, fitted and quantile residuals are primarily used due to their compatibility with the lognormal distribution’s properties and their effectiveness in diagnostic analysis. Fitted residuals are straightforward and directly measure the difference between observed and predicted values, making them intuitive for assessing model fit in the LNRM. Quantile residuals, on the other hand, are particularly advantageous because they transform the residuals into an asymptotically normal distribution, which facilitates the identification of influential observations and model diagnostics in non-normal models like the LNRM [29].

The choice of fitted and quantile residuals over Pearson or deviance residuals is justified by their specific advantages in the context of the LNRM. Pearson residuals, which standardize the raw residuals by the estimated standard deviation, can be problematic in the LNRM because the lognormal distribution’s skewed nature may lead to heteroscedasticity, making Pearson residuals less reliable for detecting influential points. Deviance residuals, commonly used in the GLMs, are based on the contribution of each observation to the model’s deviance, but their interpretation in the LNRM is less straightforward due to the lognormal distribution’s multiplicative structure. In contrast, fitted residuals are simple to compute and interpret, directly reflecting the model’s prediction errors. Quantile residuals address the skewness of the lognormal distribution by mapping the residuals to a standard normal scale, improving the detection of outliers and influential observations. This normality property makes quantile residuals particularly effective for diagnostic plots and statistical tests, offering a robust alternative to Pearson and deviance residuals in the LNRM [29]. This selection of fitted and quantile residuals enhances the reliability of influence diagnostics in the LNRM by addressing the distribution’s unique characteristics, ensuring more accurate detection of influential observations compared to Pearson or deviance residuals.

2.2.1. Fitted Residual

In the LNRM, this residual is the difference between the observed and fitted outcomes of the response variables.

r_{i} = y_{i} - {\hat{μ}}_{i},

(12)

where

\hat{μ} = e^{X {\hat{β}}_{M L} + \frac{{\hat{σ}}^{2}}{2}}

and

{\hat{σ}}^{2} = \frac{\sum r_{i}^{2}}{n - p - 1}

.

2.2.2. Quantile Residual

The quantile residual is a simple and commonly used residual for diagnostics analysis in regression model [30]. It works well for GLM having non-normal deviance and Pearson residual. However, the quantile residual follows an asymptotically normal distributed. It is generally defined as

r_{q i} = {\hat{ϕ}}^{- 1} \{F (y_{i}; {\hat{μ}}_{i} {\hat{σ}}^{2})\},

(13)

where

\hat{ϕ} (.)

is the cumulative distribution function of the standard normal distribution,

F (.)

is the cumulative distribution function of the LN distribution and

\hat{μ}

and

\hat{σ}

are the ML estimates of the parameters

μ

and

σ

.

2.3. Influential Observation Detection Methods

It is necessary to analyze the fitted model before concluding it. For this reason, a complete understanding of the model and the nature of the fitted model is needed [31]. The influential diagnostics will be well-suited to access the strength of the fitted model. The inference drawn from the model in the presence of influential points is not reliable. It is necessary to address these points while fitting a model. The LNRM is particularly sensitive to influential observations due to the inherent skewness of the lognormal distribution. This skewness is directly related to variance of the lognormal distribution. The skewness results in a heavier right tail, where extreme observations can have a magnified impact on the multiplicative relationships modeled by the LNRM. Influential points, especially those in the tail of the distribution or with high leverage in the explanatory variables, can disproportionately affect parameter estimates and model fit. Diagnostic methods such as Cook’s distance, modified Cook’s distance, the covariance ratio, and the Hadi method must therefore be robust to this skewness to accurately detect influential observations. The use of fitted and quantile residuals in these diagnostics helps mitigate the effects of skewness, with quantile residuals being particularly effective as they transform the residuals to an approximately normal scale, reducing the influence of extreme values [30]. Regarding the LRM and GLM, various types of influential diagnostics such as DFFITS, the covariance ratio and Cook’s distance are available in the literature. Moreover, we include some diagnostics methods for the LNRM such as the covariance ratio, Cook’s distance, modified Cook’s distance, and the Hadi methods.

2.3.1. Cook’s Distance (D)

The D statistic was introduced by Cook for influence analysis in the LRM [7]. The D statistic calculates the total change in the fitted model when an observation is removed from it. Based on the theory of the LRM and GLM, the D statistic for the LNRM is computed ass

D_{i} = \frac{{({\hat{β}}_{M L} - {\hat{β}}_{M L (i)})}^{'} X^{'} \hat{W} X ({\hat{β}}_{M L} - {\hat{β}}_{M L (i)})}{(p + 1) {\hat{σ}}^{2}} .

(14)

After simplification, the Equation (14) can be written as

D_{i} = \frac{r_{i}^{2}}{(p + 1)} \frac{h_{i i}}{{(1 - h_{i i})}^{2}}, I = 1, 2, \dots, n

(15)

where

h_{i i}

are the

i t h

diagonal elements of the hat matrix which can be defined as

H = {\hat{W}}^{\frac{1}{2}} X {(X^{'} \hat{W} X)}^{- 1} X^{'} {\hat{W}}^{\frac{1}{2}},

(16)

Furthermore, Equation (15) can alternatively be defined as

D_{i} = \frac{r_{i}^{' 2}}{(p + 1)} \frac{h_{i i}}{(1 - h_{i i})},

(17)

where

r_{i}^{'} = \frac{r_{i}}{\sqrt{{\hat{σ}}^{2} (1 - h_{i i})}}

is standardized fitted residual. The D statistic is used to access the impact of influential observation on

\hat{β}

[32]. Influential observations can be detected using the Cook’s distance with cutoff point, that is

D_{i} \geq F_{α} (p + 1, n - p - 1) .

However, in the GLM this cutoff point is fails for detecting the influential points. Hardin and Hilbe [33] stated that the cutoff point for the identifying of an influential point in the GLM is

\frac{4}{n - p - 1}

.

2.3.2. Modified Cook’s Distance (MCD)

The modified Cook’s distance

(D^{*})

was first introduced by [34]. It is computed for the LNRM as

D_{i}^{*} = {[\frac{{({\hat{β}}_{M L} - {\hat{β}}_{M L (i)})}^{'} X^{'} \hat{W} X ({\hat{β}}_{M L} - {\hat{β}}_{M L (i)})}{(\frac{p + 1}{n - p - 1}) {\hat{σ}}_{(i)}}]}^{\frac{1}{2}},

(18)

Moreover, Equation (18) can be simplified as

D_{i}^{*} = {[\frac{n - p - 1}{p + 1} \frac{h_{i i}}{(1 - h_{i i})} \frac{r_{i}^{2}}{{\hat{σ}}_{(i)}}]}^{\frac{1}{2}},

(19)

where

{\hat{σ}}_{(i)} = (\frac{n - p - r_{i}^{' 2}}{n - p - 1}) \hat{σ}, s o E q u a t i o n (19)

can also be written as

D_{i}^{*} = {[\frac{n - p - 1}{p + 1} \frac{h_{i i}}{1 - h_{i i}}]}^{\frac{1}{2}} |t_{i}|,

(20)

where

t_{i} = r_{i}^{'} \sqrt{\frac{n - p - 1}{n - p - r_{i}^{' 2}}}

, and is also called as eliminated studentized residuals [35]. The observations are avowed as the influential observations, if

D_{i}^{*} \geq

2

\sqrt{\frac{n - p - 1}{n} .}

2.3.3. Covariance Ratio

The covariance ratio evaluates the relative impact of every observation in the dataset on the regression coefficients [32]. The covariance ratio is used to measure how the precision of a model is affected by the exclusion of a specific observation or group of observations.

For the LNRM, the covariance ratio can be given as

C V R_{i} = \frac{|M S E_{i} {(X_{(i)}^{'} {\hat{W}}_{(i i)} X_{(i)})}^{- 1}|}{|{(X^{'} \hat{W} X)}^{- 1}|} .

(21)

After simplification, Equation (21) can be written as

C V R_{i} = \frac{{[\frac{(n - p - r_{i}^{' 2})}{n - p - 1}]}^{p + 1}}{1 - h_{i i}} .

(22)

Belsley et al. [35] showed that the ith observation is influential observation if

C V R_{i} > 1 + 3 \frac{(p + 1)}{n}

or

C V R_{i} < 1 - 3 \frac{(p + 1)}{n} .

2.3.4. Hadi Method

Hadi developed a method for the identification of influential observations in the LRM [36]. Based on his theory, this method for the LNRM is given as

H_{d i} = \frac{p + 1}{1 - h_{i i}} \frac{d_{i}^{2}}{1 - d_{i}^{2}} + \frac{h_{i i}}{1 - h_{i i}}, i = 1,2, \dots, n,

(23)

where

d_{i} = \frac{{\hat{ε}}_{i}^{2}}{\sum {\hat{ε}}_{i}^{2}} .

For the LNRM, replace

d_{i}

as

d_{i} = \frac{r_{i}^{2}}{\sum r_{i}^{2}}

in Equation (23). Hadi method identified the ith observation as an influential observation, if

H_{d i} > M e a n (H_{d i}) + c \sqrt{V a r (H_{d i})}

, where

c

may be 2 or 3.

3. Results

In this section, we evaluate the performance of influential diagnostics with fitted and quantile residuals on the basis of a simulation and a real-life application. In the simulation study, we analyzed the influential observation diagnostic methods with different sample sizes, various dispersion and varying number of independent variables to compare the effectiveness of diagnostic methods for the LNRM.

3.1. Simulation Layout

We consider the simulation structure to compare the effectiveness of proposed influential diagnostics. The response variable of the LNRM is generated from the LN distribution as

y_{i} \sim L N (μ, σ^{2})

and the mean function of the LNRM is generated as

μ = e^{X β + \frac{σ^{2}}{2}}

, where

σ^{2} i s t h e d i s p e r s i o n p a r a m e t e r a n d μ

is the mean function, and assumed the following arbitrary values of

σ

i.e.,

σ

= 0.5, 1, 3, 9. For the true parameters of the regression coefficients, there are different ways to consider such values. Some used these values of maximum eigen vector of

X^{'} X

[37]. While some used the we use the arbitrary values for the true regression parameter. Here, we consider these values as

β_{0} = 0.05, β_{1} = 0.0025, β_{2} = 0.005, β_{3} = β_{4} = 0.0001

[16]. The design matrix

X

with no influential observations of sample sizes

n

= 25, 50, 100, and 200 is generated as

x_{i j} \sim N (0, 1) .

We take four sets of independent variables as

p = 1, 2, 4, 8

, and then we change 5th observation as influence observation in each independent variable as

x_{5 j} = a_{0} + x_{5 j},

where

a_{0} = \tilde{x_{j}} + ζ (p e r t u r b a t i o n) and ζ = 50, 100, 200 .

To address the arbitrary choices of

ζ = 100

, we conduct a sensitivity analysis by additionally setting

ζ = 50

and

ζ = 200

to evaluate the robustness of the diagnostic methods under different perturbation magnitudes. Now the effectiveness of these LNRM diagnostics for the detection of influential observations is done based on the 1000-LN generated samples. This simulation is run 1000 times using R 4.3.1 software. To compare the performance of LNRM diagnostics, the influential observation detection percentage is employed as

D e t e c t i o n (%) = \frac{T o t a l n u m b e r o f d e t e c t e d i n f l u e n t i a l o b s e r v a t i o n s i n R r e p l i c a t i o n s}{T o t a l n u m b e r o f r e p l i c a t i o n s} \times 100 .

3.2. Results and Discussion

The simulation results, presented in Table 1, Table 2, Table 3 and Table 4 and visualized in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21, Figure 22, Figure 23, Figure 24, Figure 25, Figure 26, Figure 27, Figure 28, Figure 29, Figure 30, Figure 31, Figure 32, Figure 33, Figure 34, Figure 35, Figure 36, Figure 37, Figure 38, Figure 39, Figure 40, Figure 41, Figure 42, Figure 43, Figure 44, Figure 45, Figure 46, Figure 47, Figure 48, Figure 49, Figure 50, Figure 51, Figure 52, Figure 53, Figure 54, Figure 55, Figure 56, Figure 57, Figure 58, Figure 59, Figure 60, Figure 61, Figure 62, Figure 63 and Figure 64, evaluate the performance of influence diagnostic methods (Cook’s distance, modified Cook’s distance, covariance ratio, and Hadi method) using fitted and quantile residuals across varying sample sizes (n = 25, 50, 100, 200), dispersion parameters (

σ

= 0.5, 1, 3, 9), different number of explanatory variables (p = 1, 2, 4, 8) and perturbations (

ζ

). The sensitivity of LNRM influence diagnostics with both residuals across all simulation parameters is shown in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21, Figure 22, Figure 23, Figure 24, Figure 25, Figure 26, Figure 27, Figure 28, Figure 29, Figure 30, Figure 31, Figure 32, Figure 33, Figure 34, Figure 35, Figure 36, Figure 37, Figure 38, Figure 39, Figure 40, Figure 41, Figure 42, Figure 43, Figure 44, Figure 45, Figure 46, Figure 47, Figure 48, Figure 49, Figure 50, Figure 51, Figure 52, Figure 53, Figure 54, Figure 55, Figure 56, Figure 57, Figure 58, Figure 59, Figure 60, Figure 61, Figure 62, Figure 63 and Figure 64 and extended to include results for perturbation of 50 and 200. The detection percentage of influential observations serves as the primary performance metric.

For small sample sizes (n = 25) and low dispersion (

σ

= 0.5), as shown in Table 1 and Figure 1, Figure 2, Figure 3 and Figure 4, most methods detect high influence rates close to 100% under both fitted and quantile residuals for

ζ = 100

. Cook’s distance and the Hadi method consistently achieve 100% detection, outperforming other methods. The covariance ratio and MCD show slightly lower rates (e.g., 99.5% and 99.8% for p = 1,

σ

= 0.5). The sensitivity analysis reveals that for

ζ = 50

, detection rates remain high (above 98% for Cook’s distance and Hadi method, slightly lower for MCD, while for

ζ = 200

, all methods approach 100% detection due to the increased perturbation magnitude, consistent with trends in Figure 1, Figure 2, Figure 3 and Figure 4. As dispersion increases (

σ = 9

), detection rates remain robust, though MCD shows a slight decline (e.g., 98.5% for p = 1, n = 25,

σ = 9

) for

ζ = 100

, with similar patterns for

ζ = 50

(e.g., 97.8% for MCD) and near-perfect detection for

ζ = 200

.

As sample sizes increase (n = 50, 100, 200) and dispersion parameters rise, as shown in Table 2, Table 3 and Table 4 and Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21, Figure 22, Figure 23, Figure 24, Figure 25, Figure 26, Figure 27, Figure 28, Figure 29, Figure 30, Figure 31, Figure 32, Figure 33, Figure 34, Figure 35, Figure 36, Figure 37, Figure 38, Figure 39, Figure 40, Figure 41, Figure 42, Figure 43, Figure 44, Figure 45, Figure 46, Figure 47, Figure 48, Figure 49, Figure 50, Figure 51, Figure 52, Figure 53, Figure 54, Figure 55, Figure 56, Figure 57, Figure 58, Figure 59, Figure 60, Figure 61, Figure 62, Figure 63 and Figure 64, Cook’s distance and the Hadi method maintain near-perfect detection rates (100%) across all scenarios for

ζ = 100

. The covariance ratio performs strongly, with detection rates above 96%, though it drops slightly (e.g., 96.5% for p = 1, n = 200,

σ

= 0.5). The MCD exhibits notable underperformance, particularly with quantile residuals, with detection rates as low as 75.9% for p = 8, n = 200,

σ

= 9 (Table 4). The sensitivity analysis shows that for

ζ = 50

, MCD’s detection rates improve slightly (e.g., 80.2% for p = 8, n = 200,

σ

= 9), but remain lower than other methods, while for

ζ = 200

, MCD’s performance approaches that of other methods (e.g., 95.6%), reflecting the stronger signal of influential points. These trends align with Figure 49, Figure 50, Figure 51, Figure 52, Figure 53, Figure 54, Figure 55, Figure 56, Figure 57, Figure 58, Figure 59, Figure 60, Figure 61, Figure 62, Figure 63 and Figure 64, where MCD’s sensitivity to residual scale is evident. The underperformance of MCD is likely due to its reliance on eliminated studentized residuals, which may amplify small differences in quantile residuals, particularly in high-dispersion scenarios, due to their asymptotically normal distribution.

The sensitivity analysis confirms the robustness of Cook’s distance and the Hadi method, as both maintain high detection rates under different perturbations (

ζ = 50, 100, 200

), with smaller variation (e.g., 99.8–100% for Cook’s distance). The covariance ratio shows slight sensitivity to lower perturbation (

ζ = 50

), with detection rates dropping by 1–2% in some cases, but remains reliable. MCD’s performance improves with larger perturbations (

ζ = 200)

, but its inconsistency with quantile residuals persists, suggesting that its formulation may not fully capture the lognormal distribution’s skewness. Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21, Figure 22, Figure 23, Figure 24, Figure 25, Figure 26, Figure 27, Figure 28, Figure 29, Figure 30, Figure 31, Figure 32, Figure 33, Figure 34, Figure 35, Figure 36, Figure 37, Figure 38, Figure 39, Figure 40, Figure 41, Figure 42, Figure 43, Figure 44, Figure 45, Figure 46, Figure 47, Figure 48, Figure 49, Figure 50, Figure 51, Figure 52, Figure 53, Figure 54, Figure 55, Figure 56, Figure 57, Figure 58, Figure 59, Figure 60, Figure 61, Figure 62, Figure 63 and Figure 64 illustrate that Cook’s distance and the Hadi method consistently identify influential observations with clear peaks, with quantile residuals enhancing detection sensitivity compared to fitted residuals, a pattern that holds across all

ζ

values. One potential cause of MCD’s underperformance is its sensitivity to the scale of residuals, which may not adequately capture leverage in the LNRM when quantile residuals are used. To improve MCD’s performance, modifications such as robust scaling of residuals or incorporating leverage-adjusted weights could be explored, though such approaches require further validation.

To summarize the findings from Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21, Figure 22, Figure 23, Figure 24, Figure 25, Figure 26, Figure 27, Figure 28, Figure 29, Figure 30, Figure 31, Figure 32, Figure 33, Figure 34, Figure 35, Figure 36, Figure 37, Figure 38, Figure 39, Figure 40, Figure 41, Figure 42, Figure 43, Figure 44, Figure 45, Figure 46, Figure 47, Figure 48, Figure 49, Figure 50, Figure 51, Figure 52, Figure 53, Figure 54, Figure 55, Figure 56, Figure 57, Figure 58, Figure 59, Figure 60, Figure 61, Figure 62, Figure 63 and Figure 64, the diagnostic plots clearly reinforce the findings summarized in the tables. For example, Figure 1, Figure 5, Figure 13 and Figure 29 show that Cook’s distance consistently produces good results in detecting influential observations across various sample sizes and dispersion parameters. Similarly, the Hadi method (see Figure 17, Figure 21, Figure 41 and Figure 61) demonstrates stable and accurate identification of influential points, maintaining high detection rates. Quantile residuals enhance this detectability, as seen in Figure 6, Figure 22, Figure 46 and Figure 62, where the influence detections are more pronounced compared to their fitted residual counterparts (e.g., Figure 5, Figure 21 and Figure 45). In contrast, the MCD’s plots (e.g., Figure 8, Figure 24, Figure 48 and Figure 64) often exhibit less distinct or inconsistent detections, particularly under high dispersion and dimensionality, indicating its reduced robustness. The covariance ratio plots (Figure 9, Figure 25, Figure 47 and Figure 63) generally perform well but show more variability under extreme settings. Collectively, these figures confirm that Cook’s distance and the Hadi method, especially when used with quantile residuals, are the most effective and reliable techniques for identifying influential observations in the LNRM.

Overall, Cook’s distance and the Hadi method are superior across all simulation conditions, with quantile residuals generally outperforming fitted residuals, particularly in larger samples and higher dispersion scenarios, and this robustness is consistent under different perturbation values.

4. Application: Atmospheric Environmental Dataset

We consider the real-life example to evaluate the effectiveness of the suggested methods by using a real-world dataset already discussed in the literature [38]. The data set is about atmospheric environmental conditions in New York City. This dataset consists

n

= 111 observations,

p

= 3 independent variables. Where the response variable

(y)

represents the average ozone concentration. The description of regressors includes:

x_{1} =

Solar radiation in Langley’s,

x_{2} =

Maximum daily temperature in degrees Fahrenheit,

x_{3} =

Average wind speed in miles per hour. The probability distribution testing of the response variable is the first step in selecting the appropriate regression model. To fit the suitable distribution, we use different goodness of fit tests (Anderson-Darling, Kolmogorov-Smirnov and Cramer-von Mises). We fitted normal and log-normal distributions to the response variable and the results are given in Table 5. From Table 5, we found that this data is well fitting to the lognormal distribution. This is the reason to consider this dataset for the lognormal regression diagnostics.

The index plots of diagnostic methods under fitted and quantile residual are given in Figure 65, Figure 66, Figure 67, Figure 68, Figure 69, Figure 70, Figure 71 and Figure 72. From Figure 65 and Figure 66, we observe that the observations 17th and 30th are diagnosed as influential observations as detected by Cook’s distance with fitted residual while Cook’s distance with quantile residua detected 1st, 6th, 7th, 9th, 11th, 17th, 18th, 19th, 20th, 21st, 23rd, 30th, 43rd, 45th, 61st, 70th, 77th and 107th observations as the influential. From Figure 67, we found that modified Cook’s distance with fitted residuals detected 17th and 30th observations are identified as influential. While in Figure 68 showed the observations 1st, 6th, 7th, 9th, 11th, 17th, 18th, 19th, 20th, 21st, 23rd, 30th, 43rd, 45th, 61st, 70th, 77th and 107th observations are identified as influential observations by using modified Cook’s distance with quantile residual. Figure 69 indicated the index plot of the CVR method with fitted residual and detected that the 4th, 6th, 7th, 14th, 17th and 88th observations are influential while 4th, 5th, 6th, 11th,14th, 17th, 20th, 21st, 23rd, 30th, 43rd, 45th, 61st, 70th, 77th and 102nd observations are declared as influential observations by the CVR with quantile residual (see Figure 70). From Figure 71 and Figure 72, we observe that the observations 17th, 20th, and 30th are detected as the influential observations by the Hadi method with fitted and quantile residuals.

Now, we identify the observations that have an actual impact on the estimates of the LNRM. For this purpose, we calculate the percentage change in the LNRME after removing the identified influential observation and Table 6 presents the findings. The 70th observation is the most influential observation which has larger impact on the LNRM estimate of

β_{3}

. This observation is identified only by Cook’s distance and modified Cook’s distance methods with quantile residual. The second most influential observation was the 20th observation that affects the LNRM estimate of

β_{0}

. All diagnostics methods successfully detect this observation with quantile residuals excluding the Hadi method. The third one is the 11th observation. The 11th observation is diagnosed by all methods with quantile residual except Hadi method and affects the LNRM regression estimate of

β_{0}

.

Influential observations in ozone modeling significantly affect lognormal regression parameters, stemming from unusual weather, measurement errors, or naturally extreme values. Unusual conditions, like heatwaves or wind anomalies, can skew estimates, as seen in the 20th and 70th observations impacting the intercept (

β_{0}

) and wind speed coefficient (

β_{3}

). Measurement errors, such as faulty solar radiation data, create misleading outliers (e.g., 11th observation). Naturally extreme values, like high ozone from clear skies (17th, 30th observations), reflect valid variability. Quantile residuals in Cook’s distance diagnostics effectively identify these points, enabling researchers to exclude errors, retain extreme values, and ensure reliable air quality modeling for policy-making.

5. Conclusions

Influential observations can significantly distort parameter estimates and inferences in regression models, particularly in the lognormal regression model (LNRM) due to its skewed response distribution. This study evaluates four diagnostic methods—Cook’s distance, modified Cook’s distance, covariance ratio, and the Hadi method—using fitted and quantile residuals to detect influential observations in the LNRM. Through an extensive Monte Carlo simulation with varying sample sizes, dispersion parameters, perturbation values, and numbers of explanatory variables, and a real-world application to an atmospheric environmental dataset, we assess their performance.

Simulation results indicate that Cook’s distance and the Hadi methods consistently give a better performance under all conditions for moderate perturbation with the sensitivity analysis showing robust performance for smaller and larger perturbations, with detection rates remaining above 99.8%. The covariance ratio performs well, with detection rates above 96% in most scenarios, though it shows slight variability in high-dispersion settings and minor reductions (1–2%) for smaller perturbation. The MCD’s underperforms, particularly with quantile residuals, with detection rates as low as 75.9% in high-dimensional, high-dispersion scenarios under moderate perturbation. The sensitivity analysis shows improved MCD performance for larger perturbation, but its inconsistency persists for smaller perturbation, suggesting sensitivity to perturbation magnitude. The real-world application supports these findings, with Cook’s distance and the Hadi method effectively identifying influential observations that significantly impact LNRM estimates, particularly with quantile residuals. These findings underscore the robustness of Cook’s distance and the Hadi methods, with quantile residuals enhancing diagnostic accuracy, making them the preferred tools for reliable influence diagnostics in the LNRM.

The underperformance of the MCD may be attributed to its reliance on eliminated studentized residuals and the lognormal distribution’s variance structure, suggesting a need for robust modifications, such as leverage-adjusted weights or residual scaling, which warrant further research. Future work should include empirical testing of these proposed modifications, ideally by simulating scenarios with known leverage and residual scale interactions, and comparing robustness against established diagnostics like Cook’s distance and the Hadi method.

To address influential observations in practice, researchers should: (1) verify data accuracy to distinguish errors from genuine extreme cases, (2) refit the model after excluding influential points to assess estimate changes, and (3) conduct sensitivity analyses to evaluate model robustness, and (4) consider model expansion or variable transformation to address potential misspecification.

Author Contributions

Conceptualization, M.A.; methodology, M.H.; software, M.H. and M.A.; formal analysis, M.H.; writing—original draft preparation, M.H.; writing—review and editing, M.A. and S.M.A.A.; visualization, M.H.; supervision, M.A.; funding acquisition, S.M.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research work was funded by Umm Al-Qura University, Saudi Arabia under grant number: 25UQU4310037GSSR06.

Data Availability Statement

The data will be available on request from the corresponding author.

Acknowledgments

The authors extend their appreciation to Umm Al-Qura University, Saudi Arabia for funding this research work through grant number: 25UQU4310037GSSR06.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis, 5th ed.; Wiley: Hoboken, NJ, USA, 2012. [Google Scholar]
Limpert, E.; Stahel, W.A.; Abbt, M. Lognormal distributions across the sciences: Keys and clues. BioScience 2001, 51, 341–352. [Google Scholar] [CrossRef]
Kleiber, C.; Zeileis, A. Applied Econometrics with R; Springer: New York, NY, USA, 2008. [Google Scholar]
Cook, R.D. Assessment of local influence. J. R. Stat. Soc. Ser. B 1986, 48, 133–155. [Google Scholar] [CrossRef]
Xiang, L.; Tse, S.K.; Lee, A.H. Influence diagnostics for generalized linear mixed models: Applications to clustered data. Comput. Stat. Data Anal. 2002, 40, 759–774. [Google Scholar] [CrossRef]
Hoque, Z.; Khan, S.; Wesolowski, J. Performance of preliminary test estimator under Linex loss function. Commun. Stat. Theory Methods 2009, 38, 252–261. [Google Scholar] [CrossRef]
Cook, R.D. Detection of influential observation in linear regression. Technometrics 1977, 19, 15–18. [Google Scholar] [CrossRef]
Chatterjee, S.; Hadi, A.S. Influential observations, high leverage points, and outliers in linear regression. Stat. Sci. 1986, 1, 379–393. [Google Scholar]
Balasooriya, U.; Daniel, L.; Rao, P.S. Identification of outliers and influential observations in linear regression: A robust approach. Commun. Stat. Simul. Comput. 1987, 16, 647–670. [Google Scholar]
Brown, G.C.; Lawrence, A.J. Regression diagnostics for the identification of influential observations in linear models. Aust. N. Z. J. Stat. 2000, 42, 451–466. [Google Scholar]
Meloun, M.; Militký, J. Detection of single influential points in OLS regression model building. Anal. Chim. Acta 2001, 439, 169–191. [Google Scholar] [CrossRef]
Nurunnabi, A.A.M.; Rahmatullah Imon, A.H.M.; Nasser, M. Identification of multiple influential observations in logistic regression. J. Appl. Stat. 2010, 37, 1605–1624. [Google Scholar] [CrossRef]
Vanegas, L.H.; Rondón, L.M.; Cordeiro, G.M. Diagnostic tools in generalized Weibull linear regression models. J. Stat. Comput. Simul. 2013, 83, 2315–2338. [Google Scholar] [CrossRef]
Jang, D.H.; Anderson-Cook, C.M. Firework plots for evaluating the impact of outliers and influential observations in generalized linear models. Qual. Technol. Quant. Manag. 2015, 12, 423–436. [Google Scholar] [CrossRef]
Zhang, Z. Residuals and regression diagnostics: Focusing on logistic regression. Ann. Transl. Med. 2016, 4, 195. [Google Scholar] [CrossRef]
Amin, M.; Amanullah, M.; Aslam, M. Empirical evaluation of the inverse Gaussian regression residuals for the assessment of influential points. J. Chemom. 2016, 30, 394–404. [Google Scholar] [CrossRef]
Bae, W.; Noh, S.; Kim, C. Case influence diagnostics for the significance of the linear regression model. Commun. Stat. Appl. Methods 2017, 24, 155–162. [Google Scholar] [CrossRef]
Amin, M.; Amanullah, M.; Cordeiro, G.M. Influence diagnostics in the gamma regression model with adjusted deviance residuals. Commun. Stat.-Simul. Comput. 2017, 46, 6959–6973. [Google Scholar] [CrossRef]
Imran, M.; Akbar, A. Diagnostics via partial residual plots in inverse Gaussian regression. J. Chemom. 2020, 34, e3203. [Google Scholar]
Khaleeq, J.; Amanullah, M.; Almaspoor, Z. Influence diagnostics in log-normal regression model with censored data. Math. Probl. Eng. 2021, 2021, 1–15. [Google Scholar] [CrossRef]
Khan, A.; Ullah, M.A.; Amin, M.; Muse, A.H.; Aldallal, R.; Mohamed, M.S. Empirical examination of the Poisson regression residuals for the evaluation of influential points. Math. Probl. Eng. 2022, 2022, 4681597. [Google Scholar] [CrossRef]
Amin, M.; Fatima, A.; Akram, M.N.; Kamal, M. Influential observation detection in the logistic regression under different link functions: An application to urine calcium oxalate crystals data. J. Stat. Comput. Simul. 2024, 94, 346–359. [Google Scholar] [CrossRef]
Camilleri, C.; Alter, U.; Cribbie, R.A. Identifying influential observations in multiple regression. Quant. Methods Psychol. 2024, 20, 96–105. [Google Scholar] [CrossRef]
Soale, A.N. Detecting influential observations in single-index Fréchet regression. Technometrics 2025, 67, 311–322. [Google Scholar] [CrossRef]
Khan, A.J.; Akbar, A.; Kibria, B.M.G. Influence of residuals on Cook’s distance for Beta regression model: Simulation and application. Hacet. J. Math. Stat. 2025, 54, 618–632. [Google Scholar] [CrossRef]
Atkinson, A.C. Two graphical displays for outlying and influential observations in regression. Biometrika 1981, 68, 13–20. [Google Scholar] [CrossRef]
Ajiferuke, I.; Famoye, F. Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models. J. Informetr. 2015, 9, 499–513. [Google Scholar] [CrossRef]
Paula, G.A. On diagnostics in double generalized linear models. Comput. Stat. Data Anal. 2013, 68, 44–51. [Google Scholar] [CrossRef]
Dunn, P.K.; Smyth, G.K. Randomized quantile residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar] [CrossRef]
Rigby, R.A.; Stasinopoulos, M.D.; Heller, G.Z.; Bastiani, F.D. Distributions for Modeling Location, Scale, and Shape Using GAMLSS in R; CRC Press: New York, NY, USA, 2020. [Google Scholar]
Pregibon, D. Logistic regression diagnostics. Ann. Stat. 1981, 9, 705–724. [Google Scholar] [CrossRef]
Ullah, M.A.; Pasha, G.R. The origin and developments of influence measures in regression. Pak. J. Stat. 2009, 25, 295–309. [Google Scholar]
Hardin, J.W.; Hilbe, J.M. Generalized Linear Models and Extensions, 3rd ed.; Stata Press: College Station, TX, USA, 2012. [Google Scholar]
Fox, J. An R and S-Plus Companion to Applied Regression; Sage Publications: Thousand Oaks, CA, USA, 2002. [Google Scholar]
Belsley, D.A.; Kuh, E.; Welsch, R. Regression Diagnostics Identifying Influential Data and Sources of Collinearity; Wiley: New York, NY, USA, 1980. [Google Scholar]
Hadi, A.S. A New Measure of Overall Potential Influence in Linear Regression. Comput. Stat. Data Anal. 1992, 14, 1–27. [Google Scholar] [CrossRef]
Kibria, B.M.G.; Månsson, K.; Shukur, G. Performance of Some Logistic Ridge Regression Estimators. Comput. Econ. 2012, 40, 401–414. [Google Scholar] [CrossRef]
Bruntz, S.M.; Cleveland, W.S.; Graedel, T.E.; Kleiner, B.; Warner, J.L. Ozone concentrations in New Jersey and New York: Statistical association with related variables. Science 1974, 186, 257–259. [Google Scholar] [CrossRef] [PubMed]

Figure 1. LNRM’s Influence Diagnostics for p = 1, n = 25, σ = 0.5.

Figure 2. LNRM’s Influence Diagnostics for p = 1, n = 25, σ = 1.

Figure 3. LNRM’s Influence Diagnostics for p = 1, n = 25, σ = 3.

Figure 4. LNRM’s Influence Diagnostics for p = 1, n = 25, σ = 9.

Figure 5. LNRM’s Influence Diagnostics for p = 1, n = 50, σ = 0.5.

Figure 6. LNRM’s Influence Diagnostics for p = 1, n = 50, σ = 1.

Figure 7. LNRM’s Influence Diagnostics for p = 1, n = 50, σ = 3.

Figure 8. LNRM’s Influence Diagnostics for p = 1, n = 50, σ = 9.

Figure 9. LNRM’s Influence Diagnostics for p = 1, n = 100, σ = 0.5.

Figure 10. LNRM’s Influence Diagnostics for p = 1, n = 100, σ = 1.

Figure 11. LNRM’s Influence Diagnostics for p = 1, n = 100, σ = 3.

Figure 12. LNRM’s Influence Diagnostics for p = 1, n = 100, σ = 9.

Figure 13. LNRM’s Influence Diagnostics for p = 1, n = 200, σ = 0.5.

Figure 14. LNRM’s Influence Diagnostics for p = 1, n = 200, σ = 1.

Figure 15. LNRM’s Influence Diagnostics for p = 1, n = 200, σ = 3.

Figure 16. LNRM’s Influence Diagnostics for p = 1, n = 200, σ = 9.

Figure 17. LNRM’s Influence Diagnostics for p = 2, n = 25, σ = 0.5.

Figure 18. LNRM’s Influence Diagnostics for p = 2, n = 25, σ = 1.

Figure 19. LNRM’s Influence Diagnostics for p = 2, n = 25, σ = 3.

Figure 20. LNRM’s Influence Diagnostics for p = 2, n = 25, σ = 9.

Figure 21. LNRM’s Influence Diagnostics for p = 2, n = 50, σ = 0.5.

Figure 22. LNRM’s Influence Diagnostics for p = 2, n = 50, σ = 1.

Figure 23. LNRM’s Influence Diagnostics for p = 2, n = 50, σ = 3.

Figure 24. LNRM’s Influence Diagnostics for p = 2, n = 50, σ = 9.

Figure 25. LNRM’s Influence Diagnostics for p = 2, n = 100, σ = 0.5.

Figure 26. LNRM’s Influence Diagnostics for p = 2, n = 100, σ = 1.

Figure 27. LNRM’s Influence Diagnostics for p = 2, n = 100, σ = 3.

Figure 28. LNRM’s Influence Diagnostics for p = 2, n = 100, σ = 9.

Figure 29. LNRM’s Influence Diagnostics for p = 2, n = 200, σ = 0.5.

Figure 30. LNRM’s Influence Diagnostics for p = 2, n = 200, σ = 1.

Figure 31. LNRM’s Influence Diagnostics for p = 2, n = 200, σ = 3.

Figure 32. LNRM’s Influence Diagnostics for p = 2, n = 200, σ = 9.

Figure 33. LNRM’s Influence Diagnostics for p = 4, n = 25, σ = 0.5.

Figure 34. LNRM’s Influence Diagnostics for p = 4, n = 25, σ = 1.

Figure 35. LNRM’s Influence Diagnostics for p = 4, n = 25, σ = 3.

Figure 36. LNRM’s Influence Diagnostics for p = 4, n = 25, σ = 9.

Figure 37. LNRM’s Influence Diagnostics for p = 4, n = 50, σ = 0.5.

Figure 38. LNRM’s Influence Diagnostics for p = 4, n = 50, σ = 1.

Figure 39. LNRM’s Influence Diagnostics for p = 4, n = 50, σ = 3.

Figure 40. LNRM’s Influence Diagnostics for p = 4, n = 50, σ = 9.

Figure 41. LNRM’s Influence Diagnostics for p = 4, n = 100, σ = 0.5.

Figure 42. LNRM’s Influence Diagnostics for p = 4, n = 100, σ = 1.

Figure 43. LNRM’s Influence Diagnostics for p = 4, n = 100, σ = 3.

Figure 44. LNRM’s Influence Diagnostics for p = 4, n = 100, σ = 9.

Figure 45. LNRM’s Influence Diagnostics for p = 4, n = 200, σ = 0.5.

Figure 46. LNRM’s Influence Diagnostics for p = 4, n = 200, σ = 1.

Figure 47. LNRM’s Influence Diagnostics for p = 4, n = 200, σ = 3.

Figure 48. LNRM’s Influence Diagnostics for p = 4, n = 200, σ = 9.

Figure 49. LNRM’s Influence Diagnostics for p = 8, n = 25, σ = 0.5.

Figure 50. LNRM’s Influence Diagnostics for p = 8, n = 25, σ = 1.

Figure 51. LNRM’s Influence Diagnostics for p = 8, n = 25, σ = 3.

Figure 52. LNRM’s Influence Diagnostics for p = 8, n = 25, σ = 9.

Figure 53. LNRM’s Influence Diagnostics for p = 8, n = 50, σ = 0.5.

Figure 54. LNRM’s Influence Diagnostics for p = 8, n = 50, σ = 1.

Figure 55. LNRM’s Influence Diagnostics for p = 8, n = 50, σ = 3.

Figure 56. LNRM’s Influence Diagnostics for p = 8, n = 50, σ = 9.

Figure 57. LNRM’s Influence Diagnostics for p = 8, n = 100, σ = 0.5.

Figure 58. LNRM’s Influence Diagnostics for p = 8, n = 100, σ = 1.

Figure 59. LNRM’s Influence Diagnostics for p = 8, n = 100, σ = 3.

Figure 60. LNRM’s Influence Diagnostics for p = 8, n = 100, σ = 9.

Figure 61. LNRM’s Influence Diagnostics for p = 8, n = 200, σ = 0.5.

Figure 62. LNRM’s Influence Diagnostics for p = 8, n = 200, σ = 1.

Figure 63. LNRM’s Influence Diagnostics for p = 8, n = 200, σ = 3.

Figure 64. LNRM’s Influence Diagnostics for p = 8, n = 200, σ = 9.

Figure 65. Index plot of Cook’s distance under fitted residuals.

Figure 66. Index plot of Cook’s distance under quantile residuals.

Figure 67. Index plot of Modified Cook’s distance under fitted residuals.

Figure 68. Index plot of modified Cook’s distance under quantile residuals.

Figure 69. Index plot of CVR under fitted residuals.

Figure 70. Index plot of CVR under quantile residuals.

Figure 71. Index plot of Hadi method under fitted residuals.

Figure 72. Index plot of Hadi method under quantile residuals.

Table 1. Performance of LNRM influence diagnostics for p = 1.

$n$	$σ$	$C D_{r i}$	$C D_{r q i}$	$M C D_{r i}$	$M C D_{r q i}$	$C V R_{r i}$	$C V R_{r q i}$	$H D_{r i}$	$H D_{r q i}$
25	0.5	99.9	99.9	99.5	99.8	99.7	99.9	100	100
	1	99.9	99.9	99.9	99.9	99.7	100	100	100
	3	100	100	99.9	99.5	100	100	100	100
	9	99.9	99.6	99.7	98.5	100	99.8	100	100
50	0.5	99.8	99.9	99.1	99.5	99.9	99.9	100	100
	1	99.9	99.9	99.4	99.3	99.7	100	100	100
	3	100	99.9	99.7	99.3	100	100	100	100
	9	100	99.6	99.9	97.4	100	100	100	100
100	0.5	99.9	99.9	98.2	99	99.9	99.8	100	100
	1	99.9	99.9	98.3	98.3	100	99.8	100	100
	3	100	100	99.7	98.6	100	99.7	100	100
	9	100	99.6	99.7	92.6	100	100	100	100
200	0.5	100	100	96.5	98.6	100	99.9	100	100
	1	99.8	99.8	96.5	96.6	99.9	100	100	100
	3	100	99.9	98.4	94.4	100	100	100	100
	9	100	99.4	99.4	88.3	99.8	100	100	100

Table 2. Performance of LNRM influence diagnostics for p = 2.

$n$	$σ$	$C D_{r i}$	$C D_{r q i}$	$M C D_{r i}$	$M C D_{r q i}$	$C V R_{r i}$	$C V R_{r q i}$	$H D_{r i}$	$H D_{r q i}$
25	0.5	100	100	99.8	99.8	99.8	99.9	100	100
	1	100	100	100	100	100	99.7	100	100
	3	100	99.9	99.8	99.7	99.8	99.8	100	100
	9	100	99.6	99.9	98.7	99.9	99.9	100	100
50	0.5	99.9	99.9	99.3	99.8	99.8	100	100	100
	1	100	100	99.7	99.8	99.8	99.9	100	100
	3	100	100	99.7	99.2	99.8	99.6	100	100
	9	100	99.6	99.9	97.7	100	99.9	100	100
100	0.5	99.7	99.9	98	98.7	99.9	99.9	100	100
	1	100	100	98.8	98.8	99.7	99.9	100	100
	3	100	99.8	99.5	98.4	99.6	99.8	100	100
	9	100	99.5	99.7	93.4	100	100	100	100
200	0.5	99.9	99.9	96.4	98	100	99.8	100	100
	1	99.8	99.8	97.2	97.3	100	100	100	100
	3	100	99.9	97.9	94	99.7	100	100	100
	9	100	99.5	99.6	84.1	100	100	100	100

Table 3. Performance of LNRM influence diagnostics for p = 4.

$n$	$σ$	$C D_{r i}$	$C D_{r q i}$	$M C D_{r i}$	$M C D_{r q i}$	$C V R_{r i}$	$C V R_{r q i}$	$H D_{r i}$	$H D_{r q i}$
25	0.5	100	100	100	100	99.2	99.5	100	100
	1	99.9	99.9	99.6	99.6	99.5	99.7	100	100
	3	100	99.9	99.8	99.2	99.6	99.5	100	100
	9	100	99.7	99.9	99.1	100	98.2	100	100
50	0.5	99.7	99.9	98.8	99.2	98.9	99.7	100	100
	1	100	100	99.5	99.3	99.3	99.5	100	100
	3	100	99.9	99.5	98.8	99.5	99.2	100	100
	9	100	99.8	99.9	96.3	99.7	99.8	100	100
100	0.5	99.7	99.7	97	99	99.2	99.7	100	100
	1	99.8	99.8	98.6	98.6	99.3	100	100	100
	3	99.8	99.6	98.7	96.3	99	99.6	100	100
	9	100	98.8	99.2	87.9	99.6	100	100	100
200	0.5	99.7	99.9	94.5	97.8	100	99.8	100	100
	1	99.7	99.7	95.9	95.9	99.8	99.4	100	100
	3	99.8	99.4	97.5	92.5	99.8	100	100	100
	9	99.9	99	99	77.2	99.4	100	100	100

Table 4. Performance of LNRM influence diagnostics for p = 8.

$n$	$σ$	$C D_{r i}$	$C D_{r q i}$	$M C D_{r i}$	$M C D_{r q i}$	$C V R_{r i}$	$C V R_{r q i}$	$H D_{r i}$	$H D_{r q i}$
25	0.5	99.8	99.9	99.5	99.6	91.9	96.3	100	100
	1	99.9	99.9	99.5	99.5	94.5	94.4	100	100
	3	99.9	99.7	99.6	98.9	97.1	90.2	100	100
	9	100	99.1	99.8	96.7	98.5	79.9	100	100
50	0.5	100	100	99.1	99.5	98.3	99.1	100	100
	1	100	100	98.6	98.6	98	97.3	100	100
	3	100	99.9	99.5	98.3	99	97.1	100	100
	9	100	98.6	99.7	92.5	99.1	98.8	100	100
100	0.5	99.7	99.8	97.2	98.5	98.4	99.3	100	100
	1	99.9	99.9	97.4	97.5	98.9	99	100	100
	3	99.9	99.8	98.5	96.2	99	98.7	100	100
	9	100	99.3	99.5	88.7	99.7	100	100	100
200	0.5	99.3	99.7	91.3	95	99.4	99.3	100	100
	1	99.9	99.9	95.4	95.5	98.6	98.9	100	100
	3	99.8	99.8	98.2	90.4	99.4	99.9	100	100
	9	100	99	99.1	75.9	99.2	100	100	100

Table 5. Distribution goodness of fit tests for Atmospheric environmental dataset.

	Probability Distributions
Goodness of Fit Tests	Normal		Lognormal
Goodness of Fit Tests	Statistics	p-value	Statistics	p-value
Anderson-darling	4.5943	0.004511	0.44688	0.801
Kolmogorov-Smirnov	0.1513	0.01242	0.057907	0.8507
Cramer-Von Mises	0.32593	0.7337	0.13688	0.9982

Table 6. Absolute change in LNRM’s estimates after deleting influential observations.

Inf. Obs.	Lognormal Regression Estimates
Inf. Obs.	$β_{0}$	$β_{1}$	$β_{2}$	$β_{3}$
1	152.988	97.7375	101.825	94.1318
4	100.804	98.781	99.1259	98.9379
5	97.1943	99.1633	98.8861	99.1343
6	152.684	100.517	102.281	99.5857
7	91.7559	102.477	99.0024	105.552
9	44.1822	104.301	95.4712	102.726
11	165.435	102.067	102.75	98.4527
17	105.55	86.0803	89.4113	111.529
18	122.637	104.281	99.0503	92.3117
19	27.0797	94.416	95.6552	103.407
20	180.588	101.772	103.497	96.9804
21	126.563	104.711	99.9874	99.2246
23	133.837	97.3313	100.374	93.1342
30	34.2729	93.7509	97.4727	114.026
43	108.45	102.6	98.6887	94.6056
45	150.102	91.8711	102.288	91.2524
61	138.806	92.5402	101.741	93.4166
70	119.679	103.799	99.0141	960.832
77	150.942	96.5666	100.945	89.1567
88	71.7297	100.884	97.4467	102.636
102	97.8347	99.2629	99.428	100.854
107	107.326	103.962	99.5836	104.149

Note: Bold value indicates the larger influence of an observation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Habib, M.; Amin, M.; Aljeddani, S.M.A. Influence Analysis in the Lognormal Regression Model with Fitted and Quantile Residuals. Axioms 2025, 14, 464. https://doi.org/10.3390/axioms14060464

AMA Style

Habib M, Amin M, Aljeddani SMA. Influence Analysis in the Lognormal Regression Model with Fitted and Quantile Residuals. Axioms. 2025; 14(6):464. https://doi.org/10.3390/axioms14060464

Chicago/Turabian Style

Habib, Muhammad, Muhammad Amin, and Sadiah M. A. Aljeddani. 2025. "Influence Analysis in the Lognormal Regression Model with Fitted and Quantile Residuals" Axioms 14, no. 6: 464. https://doi.org/10.3390/axioms14060464

APA Style

Habib, M., Amin, M., & Aljeddani, S. M. A. (2025). Influence Analysis in the Lognormal Regression Model with Fitted and Quantile Residuals. Axioms, 14(6), 464. https://doi.org/10.3390/axioms14060464

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Influence Analysis in the Lognormal Regression Model with Fitted and Quantile Residuals

Abstract

1. Introduction

2. Materials and Methods

2.1. The Log-Normal Regression Model

2.2. Log-Normal Regression Residuals

2.2.1. Fitted Residual

2.2.2. Quantile Residual

2.3. Influential Observation Detection Methods

2.3.1. Cook’s Distance (D)

2.3.2. Modified Cook’s Distance (MCD)

2.3.3. Covariance Ratio

2.3.4. Hadi Method

3. Results

3.1. Simulation Layout

3.2. Results and Discussion

4. Application: Atmospheric Environmental Dataset

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI