Next Article in Journal
Survival Analysis as Imprecise Classification with Trainable Kernels
Previous Article in Journal
Approximation Properties of a Fractional Integral-Type Szász–Kantorovich–Stancu–Schurer Operator via Charlier Polynomials
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Q-Function-Based Diagnostic and Spatial Dependence in Reparametrized t-Student Linear Model

by
Miguel A. Uribe-Opazo
1,
Rosangela C. Schemmer
2,
Fernanda De Bastiani
3,
Manuel Galea
4,
Rosangela A. B. Assumpção
5 and
Tamara C. Maltauro
1,*
1
Technological and Exact Sciences Center, Western Paraná State University, Cascavel 85819-110, PR, Brazil
2
Biopark Technology Park, Toledo 85919-899, PR, Brazil
3
Department of de Statistics, Federal University of Pernambuco, Recife 50670-420, PE, Brazil
4
Department of de Statistics, Faculty of Mathematics, Pontifical Catholic University of Chile, Santiago 7820436, Chile
5
Mathematics Coordination, Federal Technological University of Paraná, Toledo 85902-490, PR, Brazil
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(18), 3035; https://doi.org/10.3390/math13183035
Submission received: 13 August 2025 / Revised: 6 September 2025 / Accepted: 9 September 2025 / Published: 20 September 2025

Abstract

Characterizingthe spatial variability of agricultural data is a fundamental step in precision agriculture, especially in soil management and the creation of differentiated management units for increasing productivity. Modeling the spatial dependence structure using geostatistical methods is of great importance for efficiency, estimating the parameters that define this structure, and performing kriging-based interpolation. This work presents diagnostic techniques for global and local influence and generalized leverage using the displacement of the conditional expectation of the logarithm of the joint-likelihood, called the Q-function. This method is used to identify the presence of influential observations that can interfere with parameter estimations, geostatistics model selection, map construction, and spatial variability. To study spatially correlated data, we used reparameterized t-Student distribution linear spatial modeling. This distribution has been used as an alternative to the normal distribution when data have outliers, and it has the same form of covariance matrix as the normal distribution, which enables a direct comparison between them. The methodology is illustrated using one real data set, and the results showed that the modeling was more robust in the presence of influential observations. The study of these observations is indispensable for decision-making in precision agriculture.

1. Introduction

Geostatistics differs from classical statistics because the models of classical statistics usually focus on frequency checking and incorporate the interpretation of the spatial correlation of the samples. It assumes that the difference between two sample points depends on the distance between them and the orientation of the points, that is, closer pairs of observations are more similar to each other than pairs of more distant observations. However, in practice, atypical observations can affect the behavior of spatial dependence, especially when the normal distribution is assumed. Several authors, such as [1,2,3,4] have presented more robust approaches, with model estimates that are less sensitive to these observations. An alternative proposal by [5], studied later by [6], is to use the reparametrized t-Student distribution, which belongs to the symmetric distribution class and enables us to reduce the influence of outliers, as well as allowing for the existence of the second finite moment and a direct comparison between the matrix t-Student and the normal distribution. According to [7], it is possible that a single observation has a significant influence on the results of an analysis of spatial data, being able to considerably alter the results that define the spatial dependence structure, consequently changing the construction of the maps.
Following this thinking, diagnostic analysis is extremely important for detecting these observations. Several diagnostic analyses are presented as follows: Ref. [8] assessed local influence using elliptical linear models with a longitudinal structure; Ref. [9] used diagnostic techniques to assess the sensitivity of maximum likelihood estimators, the covariance function, and the linear predictor to small perturbations in the data and the assumptions of the spatial linear model. Refs. [10,11] worked with diagnostic techniques in Gaussian spatial linear models with repetition. Ref. [12] worked with the diagnostics of influence in spatial models with censored responses.
The purpose of this paper is to extend the work of [6] for the application of diagnostic analysis in the reparametrized t-Student distribution, through global and local influence techniques developed using the maximum likelihood and Q-function, which is an alternative to Cook’s procedure [13].
This paper proceeds as follows. The Section 2 is divided into subsections as follows: The t-Student spatial linear model subsection presents the reparametrized t-Student linear spatial model. The maximum likelihood estimation subsection presents the maximum likelihood estimation and the development of the parameter estimation process. The iterative algorithm subsection describes the algorithm. The asymptotic standard error estimation subsection explains how to obtain an estimate of the asymptotic standard errors. The selection of the parameter of form η subsection describes the criteria for selecting the parameter of the form η . The QQ-plot subsection describes the methodology for obtaining the QQ-plot graph. The influence diagnostics subsection presents the development of diagnostic tools to detect influential points using techniques of global and local influence and generalized leverage. The Section 3 presents the application of the methodology to a real data set and illustrates the method for analyzing real data for the reparameterized t-Student model, presenting the results obtained. The Section 4 presents a brief discussion of the results compared to the works found in the literature. The Section 5 presents the conclusions of the work.

2. Materials and Methods

2.1. The t-Student Spatial Linear Model

An alternative to the multivariate normal distribution is the multivariate t-Student distribution; this distribution has been widely used in the study of real data because it has heavier tails and allows for the incorporation of the atypical points present in the data set. Furthermore, it is a symmetrical distribution with an additional parameter, the degree of freedom, which is represented by v ( v > 0 ) and allows us to model the kurtosis of the data.
Ref. [5] suggests reparameterizing the t-Student distribution to allow for a direct comparison between the estimation of the mean vector parameters and the covariance matrix with the normal model. The reparameterization of this distribution is a transformation of the degrees of freedom, considering η = 1 / v . This reparameterization is justified by the importance of modeling the spatial dependence structure, since the new shape parameter ( η ) is bounded ( 0 < η < 1 / 2 ) due to the assumption of the process having a finite second moment. This allows estimation of the model parameters using the EM algorithm, which is used in Kriging interpolation and later in map construction [14].
Following the methodology of [6], which presents the reparametrized multivariate t-Student distribution, the transformation v = 1 / η is applied. It considers that Y = ( Y 1 , , Y n ) , where a random vector follows a reparametrized t-Student distribution with a parameter of form η fixed, where 0 < η < 1 2 , with covariance matrix Σ , n × n , and mean vector E ( Y ) = μ . Its probability density function is given in Equation (1) as follows:
f Y ( y ) = K n ( η ) Σ 1 2 1 + c ( η ) δ 1 2 η ( 1 + n η ) ,
where
K n ( η ) = c ( η ) π n 2 Γ 1 + n η 2 η Γ 1 2 η ,
with δ = Y μ T Σ 1 Y μ is the Mahalanobis distance, c ( η ) = η / ( 1 2 η ) for 0 < η < 1 2 . Let Y T n ( μ , Σ , η ) denote a random vector following an n-multivariate reparameterized t-Student distribution. The stochastic representation of Y is given by
Y = d μ + V 1 / 2 Z ,
where Z N n 0 , Σ , V G a m m a 1 2 η , 1 2 c ( η ) , c ( η ) = η / ( 1 2 η ) , 0 < η < 1 2 ; V add Z independent.
To study spatial dependence, consider an isotropic second-order stochastic process Y ( s i ) , s i S , where S R 2 and R 2 is a bi-dimensional Euclidean space. Let Y = ( Y ( s 1 ) , , Y ( s n ) ) be an n × 1 response vector corresponding to the sites s i with i = 1 , , n , following an n-multivariate reparameterized t-Student distribution, denoted by Y T n ( μ , Σ , η ) , where each element Y ( s i ) can be written as Y ( s i ) = μ ( s i ) + e ( s i ) , i = 1 , , n , where both the deterministic term μ ( s i ) and stochastic e ( s i ) may depend on the spatial location at which Y ( s i ) is observed. It is assumed that random errors e ( s i ) have an expectation equal to zero, that is, E [ e ( s i ) ] = 0 , and the variation between points is determined by some covariance function C o v e ( s i ) , e ( s u ) = C o v Y ( s i ) , Y ( s u ) = C s i , s u = σ i u , for i , u = 1 , , n . Suppose that for some known function of s i , x 1 ( s i ) , , x p ( s i ) , the mean of the stochastic process is given in Equation (2) as follows:
μ ( s i ) = i = 1 p x j ( s i ) β j ,
where β 1 , , β p are unknown parameters for estimation.
In matrix notation, the spatial linear model is given by
Y = X β + ε ,
where Y T n ( X β , Σ , η ) , X is an n × p full rank matrix, with i-th row an n × p vector x i = ( x i 1 , , x i p ) with explanatory variables at the site s i , β = β 1 , , β p is a p × 1 vector of unknown parameters to estimate and ε = e ( s 1 ) , , e ( s n ) are components correlated with variance with ε T n ( 0 , Σ , η ) .
The spatial modeling given in Equation (3) depends on the structure of the covariance matrix Σ = ( σ i u ) , where σ i u = C ( s i , s u ) for i , u = 1 , , n of the stochastic process Y . A covariance function C ( s i , s u ) is used in the spatial dependence study of the stationary process and is specified by a three-dimensional vector τ = ( τ 1 , τ 2 , τ 3 ) . As presented by [15], the parametric form is given in Equation (4) as follows:
Σ = τ 1 I n + τ 2 R ( τ 3 ) ,
where τ 1 is the nugget effect, τ 1 0 ; τ 2 is known as sill, τ 2 0 ; I n is an identity matrix of order n; R ( τ 3 ) is an n × n symmetric matrix, where the elements are functions of τ 3 > 0 , R = R ( τ 3 ) = [ ( r i j ) ] , where r i i = 1 , and r i j = τ 2 1 C ( s i , s j ) for τ 2 0 , and r i j = 0 for τ 2 = 0 , i j = 1 , , n , where r i j depends on the Euclidean distance, h i j = s i s j , between points s i and s j . An alternative reparametrization of the covariance function C s i , s u = C h i u is suggested by [16] and adapted by [6], assisting in the identifiability of the model. In this paper, we considered the parameters ϕ 1 = τ 1 , ϕ 2 = τ 2 / τ 3 2 κ and ϕ 3 = τ 3 . This last parameter was considered fixed, and reparametrization of C h i u was based on the same criteria established by [6].

2.2. Maximum Likelihood Estimation

Under the assumption that Y T n ( X β , Σ , η ) , where η is the fixed form parameter and θ = β , ϕ , with β = β 1 , , β p and ϕ = ϕ 1 , ϕ 2 with ϕ 1 and ϕ 2 as unknown parameters and ϕ 3 as a parameter to be defined and fixed according to the semivariogram, the log-likelihood of the reparameterized t-Student distribution is given in Equation (5) by:
L θ = l o g K n η 1 2 l o g Σ 1 2 η 1 + n η l o g 1 + c ( η ) δ ,
where
  • l o g K n η = n 2 l o g c ( η ) π + l o g Γ 1 + n η 2 η l o g Γ 1 2 η ,
  • δ = Y X β Σ 1 Y X β , c η = η / 1 2 η , 0 < η < 1 2 .
The scores function can be written as:
  • U β = L θ β = w ( δ ) X Σ 1 ε ,
  • U ϕ = L θ ϕ j = 1 2 [ t r ( Σ 1 Σ ˙ j ) w ( δ ) ε Σ 1 Σ ˙ j Σ 1 ε ] ,
  • where Σ ˙ j = Σ / ϕ j for j = 1 , 2 , w ( δ ) = ( ( 1 + n η ) / η ) c η q 1 , q = 1 + c ( η ) δ and ε = Y X β .
The vector of parameters θ = β , ϕ can be estimated by maximum likelihood (ML) from the solution of the score functions U β = 0 and U ϕ = 0 .
For the parameter β ^ estimated by the maximum likelihood, the solution of the function is obtained immediately as follows:
β ^ = ( X ( Σ ( k ) ) 1 X ) 1 ( X ( Σ ( k ) ) 1 Y ) ,
with k representing the k-th iteration.
To obtain the parameter ϕ ^ estimated by maximum likelihood, ϕ j with j = 1 , 2 is determined through the system resolution (6). From U ϕ = 0 we have:
t r ( Σ 1 Σ ˙ j ) = t r ( Σ 1 Σ ˙ j Σ 1 E ) ,
where E = w ( δ ) ε ε .
Considering Σ = l = 1 2 ϕ l Σ ˙ l , where Σ ˙ 1 = I n and Σ ˙ 2 = R ( ϕ 3 ) , (6) is applied:
l = 1 2 t r ( Σ 1 Σ ˙ j Σ 1 Σ ˙ l ) ϕ l = t r ( Σ 1 Σ ˙ j Σ 1 E ) .
Notice that in matrix notation, the system (7) can be written as:
A · ϕ = b ,
where the matrix A of order 2 × 2 , with elements a j l = t r ( Σ 1 Σ ˙ j Σ 1 Σ ˙ l ) and b a vector of order ( 2 × 1 ) , with elements b j = t r ( Σ 1 Σ ˙ j Σ 1 E ) , which correspond to a 11 = t r ( Σ 1 Σ 1 ) , a 12 = a 21 = t r ( Σ 1 Σ 1 R ( ϕ 3 ) ) , a 22 = t r ( Σ 1 R ( ϕ 3 ) Σ 1 R ( ϕ 3 ) ) , b 1 = t r ( Σ 1 Σ 1 E ) and b 2 = t r ( Σ 1 R ( ϕ 3 ) Σ 1 E ) . The estimation process is given in Equation (8) by the system resolution:
ϕ ^ = A 1 · b .
The iterative algorithm section describes the algorithm for obtaining the estimation of the parameters of θ through iteration; in other words, it will obtain θ ^ = θ ( k ) when it reaches the convergence criterion in the k-th iteration.

2.3. Iterative Algorithm

Based on [17], to obtain the parameter estimation of θ , we must use an iterative algorithm and the following procedures:
Initial Iteration: Set k = 0 , where k represents the iteration phase:
  • 1 o ̲ step: Define an initial shot to the parameter to be estimated θ ( k ) = θ 0 , in what θ 0 = ( β 0 , ϕ 0 , η 0 ) . In this study, we choose to define the initial parameters β 0 and ϕ 0 , obtaining them in a regression model with a normal distribution, and η 0 is fixed and defined for all iterations, which will later be chosen by the cross-Validation criterion ( C V ) and Trace ( T r ) presented in the selection of the parameter of form η section.
  • 2 o ̲ step: We calculate the following Equations from the initial parameters obtained from the 1 o ̲ step:
    Σ ( k ) = ϕ 1 ( k ) I + ϕ 2 ( k ) R ( k ) ( ϕ 3 ) , in which ϕ 3 is fixed on the initial shot.
    c ( η ) = η 1 2 η ,
    q ( k ) = ( 1 + c ( η ) δ ( k ) ) ,
    δ ( k ) = ( ε ( k ) ) ( Σ ( k ) ) 1 ε ( k ) ,
    ε ( k ) = ( Y X β ( k ) ) ,
    w ( δ ) ( k ) = 1 + n η η c ( η ) ( q ( k ) ) 1 ,
    β ( k ) = ( X ( Σ ( k ) ) 1 X ) 1 ( X ( Σ ( k ) ) 1 Y ) ,
    E ( k ) = w ( δ ) ( k ) ε ( k ) ( ε ( k ) ) ,
    where R = R ( ϕ 3 ) is the covariance function that depends on the exponential, Gaussian, or Matérn family models, where R is introduced by [9]. Verify that these equations calculated in the 2 o ̲ step are being considered as initials where k = 0 .
  • 3 o ̲ step: from this moment, we will update the parameters to ( k + 1 ) where θ ( k + 1 ) = ( β ( k + 1 ) , ϕ ( k + 1 ) , η 0 ) . Consider the following procedures:
  • 3.1 o ̲ step: Updating the linear parameters ϕ 1 and ϕ 2 of Σ , through the linear system (9):
    A ( k ) · ϕ ( k + 1 ) = b ( k ) ,
    where the matrix A ( k ) of order 2 × 2 , with elements a j l ( k ) = t r ( ( Σ ( k ) ) 1 Σ ˙ j ( Σ ( k ) ) 1 Σ ˙ l ) and b ( k ) a vector of order 2 × 1 with elements b j ( k ) = t r ( ( Σ ( k ) ) 1 Σ ˙ j ( Σ ( k ) ) 1 E ( k ) ) , where j , l = 1 , 2 correspond to a 11 ( k ) = t r ( ( Σ ( k ) ) 1 ( Σ ( k ) ) 1 ) , a 12 ( k ) = a 21 ( k ) = t r ( ( Σ ( k ) ) 1 ( Σ ( k ) ) 1 R ( k ) ( ϕ 3 ) ) , a 22 ( k ) = t r ( ( Σ ( k ) ) 1 R ( k ) ( ϕ 3 ) ( Σ ( k ) ) 1 R ( k ) ( ϕ 3 ) ) , to the vector b 1 ( k ) = t r ( ( Σ ( k ) ) 1 ( Σ ( k ) ) 1 E ( k ) ) and b 2 ( k ) = t r ( ( Σ ( k ) ) 1 R ( k ) ( ϕ 3 ) ( Σ ( k ) ) 1 E ( k ) ) .
  • 3.2 o ̲ step: Getting ϕ ( k + 1 ) from the steps ( 3.1 o ̲ ) , we update Σ ( k + 1 ) , which will be used to update the parameter β ( k + 1 ) , by the expression given in Equation (10) by:
    β ( k + 1 ) = ( X ( Σ ( k + 1 ) ) 1 X ) 1 ( X ( Σ ( k + 1 ) ) 1 Y ) .
  • 4 o ̲ step: the iteration ends, defining θ ( k + 1 ) = θ ( 1 ) , obtained by updating step ( 3.1 o ̲ ) and step ( 3.2 o ̲ ) . From θ ( 1 ) apply the convergence criterion that is defined for this algorithm: if | | θ ( k + 1 ) θ ( k ) | | < e 1 as η and ϕ 3 are fixed, only verify the convergence to | | β ( k + 1 ) β ( k ) | | < e 1 and | | ϕ j ( k + 1 ) ϕ j ( k ) | | < e 1 ( j = 1 , 2 ) or | L c ( θ ( k + 1 ) ) L c ( θ ( k ) ) | < e 2 stop and define θ ^ = θ ( k + 1 ) , otherwise k = k + 1 return to the 2 o ̲ step. Having e 1 and e 2 constant tolerance. In general, typical tolerance values are 10 3 and 10 6 , respectively.

2.4. Asymptotic Standard Error Estimation

Asymptotic standard errors can be calculated by inverting the expected information matrix, F θ , where F θ = E L θ , with L θ = 2 L θ / θ θ . For the reparametrized t-Student F θ is given in Equation (11) by [18],
F = F θ = F β β 0 0 F ϕ ϕ , ,
where
F β β = F β β ( β ) = 1 + n η 1 2 η 1 + n + 2 η X Σ 1 X ,
and
F ϕ ϕ = F ϕ ϕ ( ϕ ) = 1 4 v e c T Σ ϕ 2 1 + n η 1 + n + 2 η Σ 1 Σ 1 N n + 1 + n η 1 + n + 2 η 1 v e c Σ 1 v e c Σ 1 v e c Σ ϕ ,
where N n = 1 2 I n 2 + K n , K n is the commutation matrix of order n 2 × n 2 , and I n 2 is the identity matrix of order n 2 × n 2 (see [19]).

2.5. Selection of the Parameter of Form η

According to [20,21], the log-likelihood function given in Equation (5) is decreasing in η , so the estimation of this parameter cannot be obtained through the maximization of the log-likelihood ( L θ ). As an alternative, Ref. [22] proposes using a matrix trace of the asymptotic covariance of an estimated average as a criterion in the selection of a better model for the class of elliptical distributions. Following this thought, Ref. [21] shows the trace criterion and cross-validation to select the degree of freedom v of the spatial liner t-Student model. From these propositions, Ref. [6] presents the trace criterion ( T r ) and the cross-validation criterion ( C V ) to select the form parameter η to the reparameterized t-Student spatial linear model. For both methods, the best parameter of the form ( η ) is determined by the smallest cross-validation and trace values. After choosing, the model parameter Matérn ( κ ) [23] is selected according to the smallest asymptotic standard error.

2.6. QQ-Plot

Following the methodology presented in [24] and the set of data { Y ( s 1 ) , , Y ( s n ) } , we determine the vector of the residuals ε ^ = ( Y X β ^ ) , where β ^ = ( X Σ ^ 1 X ) 1 ( X Σ ^ 1 Y ) , β ^ being the parameter vector and Σ ^ the covariance matrix, obtained due to the iterative algorithm section and the selection of the parameter of form η . From the estimate of the covariance matrix Σ ^ , we use the Cholesky decomposition obtained by Σ ^ = L ^ L ^ , where L ^ is the inferior triangular matrix of order n. Using the inverse matrix L ^ 1 , we determine ε ^ n c = L ^ 1 ε ^ , defined as the vector of uncorrelated residuals. Then we used the methodology [25] with the packages qqplotr and ggplot2 in the R program to build QQ plots, considering as sample data the vector of uncorrelated residuals ε ^ n c and as theoretical data the random vector with the distribution of the t-Students of both orders n. To obtain the confidence intervals, we utilized the [26] package with the application of the boot method, which creates confidence bands based on a parametric Bootstrap.

2.7. Influence Diagnostics

A study of great importance in the analysis of diagnosis is the influential observation detection; in other words, points exercising a disproportionate weight in the estimations of the model parameters or even in the significance of the parameters. Point detection may be the most well-known technique for evaluating the impact of the particular observation removal regression’s estimates. This paper considers two types of influence diagnostics: global and local influence.

2.8. Global Influence

Deleting cases is a common way to assess the effect of an observation on the estimation process. This is a global influence analysis, since the effect of the observation is evaluated by eliminating it from the data set.

2.8.1. Global Influence Based on the Likelihood

The global influence analysis is based on the elimination of one or more observations considered influential in the data set, and thus assesses the impact on the parameter estimates. This type of diagnostic technique is discussed by [27,28,29,30]. One of the most used measures of the changes in estimated parameters after excluding the observations is called Cook’s distance (see [31]). This measure was initially proposed for normal models and was quickly expanded to several model classes. Following the proposal of [30], the Cook’s distance, based on the ML estimator θ ^ , of θ = β , ϕ , is given by Equation (12):
D i θ 1 = U ( i ) θ ^ F θ ^ 1 U ( i ) θ ^ , for i = 1 , , n ,
which can be decomposed into D i θ 1 = D i β + D i ϕ , for i = 1 , , n , where D i β = U ( i ) β ^ F β β 1 β ^ U ( i ) β ^ and D i ϕ = U ( i ) ϕ ^ F ϕ ϕ 1 ϕ ^ U ( i ) ϕ ^ , where U ( i ) β ^ and U ( i ) ϕ ^ are the score functions of estimators β and ϕ respectively, without the i-th observation.

2.8.2. Global Influence Based on the Q-Function

According to [30], it is difficult to extend the case exclusion method to other models if the likelihood function has no analytical form. So, Refs. [11,30] present Cook’s distance when using the conditional expectation of the logarithm of the joint-likelihood, called the Q-function. Following this reasoning, consider that the Q-function is given in Equation (13) by:
Q θ | θ * = E l c θ | Y c , Y , θ * = n 2 l o g 2 π 1 2 l o g Σ 1 2 l o g Γ 1 2 η + 1 2 η l o g 1 2 c ( η ) + n 2 a * + 1 2 η a * w ( δ * ) 1 2 w ( δ * ) δ ,
where a * = l o g w ( δ * ) ψ n + 1 η + l o g 1 + n η η , with w ( δ * ) = 1 + n η η c ( η ) q * 1 , q * = 1 + c ( η ) δ ( θ * ) , ψ n + 1 η the digamma function, for 0 < η < 1 2 , Ref. [32] proposed a global influence measure as an alternative to obtaining θ ^ ( i ) , with the following approximation: θ ^ ( i ) = θ ^ + Q ¨ ( i ) θ ^ | θ ^ 1 Q ˙ ( i ) θ ^ | θ ^ , for i = 1 , , n , where Q ˙ ( i ) θ ^ | θ ^ = Q ( i ) θ | θ ^ θ | θ = θ ^ and Q ¨ ( i ) θ ^ | θ ^ = 2 Q ( i ) θ | θ ^ θ θ | θ = θ ^ .
According to [11,30], the new modification of the Cook’s distance based on the Q-function is given by Equation (14):
Q D i θ 1 = Q ˙ ( i ) θ ^ | θ ^ E Q ¨ θ ^ | θ ^ 1 Q ˙ ( i ) θ ^ | θ ^ for i = 1 , , n ,
where E Q ¨ θ ^ | θ ^ 1 represents a block diagonal matrix in relation to β and ϕ . The modified Cook statistics Q D i θ 1 in (14) can be written as:
Q D i θ 1 = Q D i β 1 + Q D i ϕ 1 , for i = 1 , , n ,
where
Q D i β 1 = Q ˙ β ( i ) θ ^ | θ ^ E Q ¨ β β θ ^ | θ ^ 1 Q ˙ β ( i ) θ ^ | θ ^
and
Q D i ϕ 1 = Q ˙ ϕ ( i ) θ ^ | θ ^ E Q ¨ ϕ ϕ θ ^ | θ ^ 1 Q ˙ ϕ ( i ) θ ^ | θ ^ .
Thus,
Q ˙ β ( i ) θ ^ | θ ^ = w ( i ) ( δ ^ ) X ( i ) T Σ ^ ( i ) 1 ε ^ ( i ) ,
Q ˙ ϕ ( i ) θ ^ | θ ^ = 1 2 v e c T Σ ^ ( i ) ϕ w ( i ) ( δ ^ ) v e c Σ ^ ( i ) 1 ε ^ ( i ) ε ^ ( i ) Σ ^ ( i ) 1 v e c Σ ^ ( i ) 1 ,
E Q ¨ β β θ ^ | θ ^ = E 2 Q θ | θ ^ β β T | θ = θ ^ = w ( δ ^ ) X T Σ ^ 1 X ,
E Q ¨ ϕ ϕ θ ^ | θ ^ = E 2 Q θ | θ ^ ϕ ϕ T | θ = θ ^ = 1 2 v e c Σ ^ ϕ Σ ^ 1 Σ ^ 1 v e c Σ ^ ϕ ,
with w ( δ ^ ) = 1 + n η η c ( η ) q ^ 1 , w ( i ) ( δ ^ ) = 1 + n η η c ( η ) q ^ ( i ) 1 , q ^ = 1 + c ( η ) δ ^ and q ^ ( i ) = 1 + c ( η ) δ ^ ( i ) .

2.9. Local Influence Diagnostics

The local influence method proposed by [33] involves evaluating the robustness of the estimated obtained considering an influence measure and under small perturbations applied to the model and/or the data, i.e., to verify the presence of observations that can cause distortion of the results, under small perturbations. This method does not require the elimination of observations.
Consider the spatial linear model given in Equation (3). Under the assumption that the error random vector ε follows a t-Student distribution, with a mean equal to a vector of zeros and covariance matrix Σ , i.e., ε T n ( 0 , Σ , η ) , it is possible to obtain the regression model Y ^ = X β ^ .
Thus, when the observation causes relevant changes in the results, it is called influential.
The appropriate perturbation scheme for the response variable according to [21], is given in Equation (15) by:
Y ω = Y + A ω ,
where ω is a vector n × 1 belonging to the perturbation space Ω .

2.9.1. Likelihood Displacement Diagnostics

Let L ( θ | ω ) be the perturbed log-likelihood. The influence of the perturbation, caused by the vector ω , on estimates of the ML parameters θ can be evaluated by likelihood displacement, defined as
L D ω = 2 L ( θ ^ ) L ( θ ^ ω ) ,
where θ ^ is the ML estimator of θ of the postulated model and θ ^ ω is the ML estimator of θ of the model perturbed by ω Ω . Ref. [33] proposed studying the local behavior of L D ( ω ) around ω 0 Ω , such that L ( θ | ω 0 ) = L ( θ ) . Thus, the normal curvature C l of L D ( ω ) in ω 0 in the direction of a unit vector l is defined in Equation (16):
C l = 2 l Δ L Δ l ,
where l = 1 ; L : is the hessian matrix evaluated in θ = θ ^ ; Δ : is a ( p + q ) × n , matrix given by Δ = Δ β , Δ ϕ , evaluated in θ = θ ^ and in ω = ω 0 , where Δ β = 2 L θ ^ β ω and Δ ϕ = 2 L θ ^ ϕ ω .
Ref. [21] considered the generalized appropriate perturbation of [34] Y ω = Y + A ω , on the response variable using the matrix A , n × n , which does not depend on β neither on ω , such that the Fisher information matrix for Y ω with respect to the perturbed vector ω is G ( ω 0 ) = c A Σ 1 A , where c is a positive constant. In general, A Σ 1 A I n , however, if A = Σ 1 / 2 , then G ( ω 0 ) = c I n where the appropriate perturbation, considering the reparameterized t-Student, Y ω = Y + Σ 1 / 2 ω , is as given in Equation (15). In this study, the matrix Δ = Δ β , Δ ϕ evaluated in θ = θ ^ and in ω = ω 0 , is given by
Δ β = w ( δ ^ ) X Σ ^ 1 / 2 2 c ( η ) q ^ 1 X Σ ^ 1 ε ^ ε ^ Σ ^ 1 / 2 , and Δ ϕ with elements Δ ϕ j = w ( δ ^ ) Σ ^ 1 / 2 Σ ^ 1 / 2 ϕ j Σ ^ 1 / 2 ε ^ c ( η ) q ^ 1 Σ ^ 1 / 2 ε ^ ε ^ Σ ^ 1 Σ ^ ϕ j Σ ^ 1 ε ^ , j = 1 , 2 , with ε ^ = ε ^ ( ω 0 ) = Y X β ^ , w ( δ ^ ) = 1 + n η η c ( η ) q ^ 1 , q ^ = 1 + c ( η ) δ ^ , δ ^ = Y X β ^ Σ ^ 1 Y X β ^ , c ( η ) = η / ( 1 2 η ) and 0 < η < 1 2 .
Consider the matrix B = Δ L 1 Δ and C i = 2 b i i , for i = 1 , , n , where b i i is the element of the main diagonal of the matrix B . The plot of C i versus i (order of the data) can be used to detect potential influential observations. Ref. [35] proposed considering the ith observation with C i > 2 C ¯ as a potential influential observation, where C ¯ = 1 n i = 1 n C i .
Another proposal presented by [33] defines L m a x as the first eigenvector, normalized and associated with the greatest eigenvalue of the matrix B = Δ L 1 Δ . Thus, with the elements of L m a x versus i (order of the data), we obtain a graph that can reveal which type of perturbation has the greatest influence on L D ω in ω 0 [33]. High-order influential cases are those with strong influence compared to the average l ¯ of the values l j = | l m a x | j of all cases [36]. l ¯ + 2 sd ( l ) , where sd ( l ) denotes the standard deviation of l j , j = 1 , , and   n can be used as a reference to determine the significance of contributions from an individual case [13].

2.9.2. Q-Function Based Diagnostics

The main goal is to compare θ ^ , the ML estimate of θ of the postulate model, and θ ^ ω the ML estimate of θ of the perturbed model, when ω Ω . Close values indicate that the perturbation has a small effect on the estimation procedure. On the other hand, if they differ considerably, then it is possible that the estimation procedure is sensitive to the presence of some observations. To measure this distance, Refs. [32,34] proposed the Q-function displacement, calculating the difference between θ ^ and θ ^ ω , defined by:
f Q ω = 2 Q θ | θ ^ Q θ , ω | θ ^ ,
where Q θ | θ ^ = E l c θ | Y c , Y , θ ^ and Q θ , ω | θ ^ = E l c θ , ω | Y c , Y , θ ^ are defined in Equation (13), with f Q ω 0 , ω Ω and f Q ω 0 = 0 . Similarly to [32,33] study the behavior of the surface r ( ω ) = ( ω , f Q ( ω ) ) and calculate the normal curvature to the unitary direction l p , is defined in Equation (17)
C Q = 2 l Δ ω Q ¨ θ ^ | θ ^ 1 Δ ω l ,
where l = 1 ; Q ¨ θ ^ | θ ^ = 2 Q θ | θ ^ θ θ T | θ = θ ^ , ω = ω 0 , evaluated at θ = θ ^ , ω = ω 0 ; Δ ω is a ( p + 2 ) × n matrix, given by Δ ω = Δ ω β T , Δ ω ϕ T T , evaluated at θ = θ ^ , ω = ω 0 , thus:
Q ¨ θ ^ | θ ^ = Q ¨ ( β ^ β ^ ) Q ¨ ( β ^ ϕ ^ ) Q ¨ ( ϕ ^ β ^ ) Q ¨ ( ϕ ^ ϕ ^ ) ,
where
Q ¨ β ^ β ^ = w ( δ ^ ) X Σ ^ 1 X ,
Q ¨ β ^ ϕ ^ = w ( δ ^ ) X T Σ ^ 1 Σ ^ ϕ Σ ^ 1 I 3 I 3 ε ^ = Q ¨ ϕ ^ β ^ ,
Q ¨ ϕ ^ ϕ ^ = 1 2 v e c Σ ^ ϕ Σ ^ 1 Σ ^ 1 v e c Σ ^ ϕ
1 2 v e c Σ 1 I 3 v e c ϕ v e c Σ ^ ϕ
+ w ( δ ^ ) v e c Σ ^ ϕ Σ ^ 1 Σ ^ 1 ε ^ ε ^ Σ ^ 1 v e c Σ ^ ϕ
+ w ( δ ^ ) v e c Σ ^ 1 Σ ^ 1 ε ^ ε ^ Σ ^ 1 I 3 v e c ϕ v e c Σ ^ ϕ ,
and the matrix Δ ω has elements
Δ ω β = 2 Q θ , ω | θ ^ β ω = w ( δ ^ ) X Σ ^ 1 / 2 ,
Δ ω ϕ = 2 Q θ , ω | θ ^ ϕ ω , with elements
Δ ω ϕ j = w ( δ ^ ) Σ ^ 1 / 2 Σ ^ 1 / 2 ϕ j Σ ^ 1 / 2 ε ^ , j = 1 , 2 ,
where w ( δ ^ ) = 1 + n η η c ( η ) q ^ 1 , with q ^ = 1 + c ( η ) δ ^ and δ ^ = Y X β ^ Σ ^ 1 Y X β ^ . The potential influential observation is the ith observation with C Q i > 2 C , where C = 1 n i = 1 n C Q i .
Similar to | L m a x | , we have the local influence measure | Q m a x | , constructed by shifting the conditional expectation of the joint likelihood logarithm, the Q-function. The cases of high-order influence are the cases with strong influence compared to the average Q ¯ of the values Q j = | Q m a x | j of all cases. Q ¯ + + 2 sd ( Q ) , where sd ( Q ) denotes the standard deviation of Q j , j = 1 , , n , which can be used as a reference to determine the significance of contributions from an individual case [13].

2.10. Generalized Leverage

The concept of generalized leverage is to measure the influence of the observed value on the response variable y i on its own fitted value y ^ [37,38,39].
Let μ = X β be the expected value of Y . Then, based on the generalized leverage for [39], the generalized leverage matrix, G L ( θ ^ ) = Y / Y ^ , where Y ^ = X β ^ and θ ^ ML estimator of θ . The generalized leverage is defined in Equation (18):
G L ( θ ) = D θ ( L ( θ ) ) 1 L θ Y ,
where D θ = μ / θ T = ( X , 0 ) , and L θ Y = 2 L ( θ ) / θ Y T = ( L β Y T , L ϕ Y T ) T with
L β Y = 2 L ( θ ) / β Y T = w ( δ ) X Σ 1 2 η 1 + n η w ( δ ) 2 X T Σ 1 ε ε T Σ 1 ,
L ϕ Y = 2 L ( θ ) / ϕ Y T with elements
L ϕ j Y = 2 L ( θ ) / ϕ j Y T = w ( δ ) ε T Σ 1 Σ ϕ j Σ 1 η 1 + n η w ( δ ) 2 ε T Σ 1 Σ ϕ j Σ 1 ε ε T Σ 1 ,
for j = 1 , 2 , with w ( δ ) = 1 + n η η c ( η ) q 1 , c ( η ) = η 1 2 η and q = ( 1 + c ( η ) δ ) .
After some algebra, we have G L ( θ ) = GL 1 + GL 2 , where
GL 1 = X ( L β β L β ϕ L ϕ ϕ 1 L ϕ β ) 1 ( L β Y ) and
GL 2 = X ( L β β L β ϕ L ϕ ϕ 1 L ϕ β ) 1 ( L β ϕ L ϕ ϕ 1 L ϕ Y ) , such that
L β β = 2 L ( θ ) β β T = 2 w ( δ ) 2 η 1 + n η X T Σ 1 ε ε T Σ 1 X w ( δ ) X T Σ 1 X ,
L β ϕ = 2 L ( θ ) β ϕ T , with elements
2 L ( θ ) β ϕ l = w ( δ ) 2 η 1 + n η ε T Σ 1 Σ ϕ l Σ 1 ε X T Σ 1 ε w ( δ ) X T Σ 1 Σ ϕ l Σ 1 ε , for l = 1 , 2 ,
L ϕ ϕ = 2 L ( θ ) ϕ ϕ T , with elements
2 L ( θ ) ϕ l ϕ j = 1 2 t r Σ 1 2 Σ ϕ l ϕ j Σ ϕ j Σ 1 Σ ϕ l
1 2 w ( δ ) 2 η 1 + n η ε T Σ 1 Σ ϕ j Σ 1 ε ε T Σ 1 Σ ϕ l Σ 1 ε
+ 1 2 w ( δ ) ε T Σ 1 2 Σ ϕ l ϕ j 2 Σ ϕ j Σ 1 Σ ϕ l Σ 1 ε , for j , l = 1 , 2 .
The diagonal elements G L i i for i = 1 , , n , of the matrix G L ( θ ^ ) are used as a diagnostic tool of influence in the vector Y ^ . The ith response is potentially influential if G L i i > G L ¯ + 2 s d ( G L ) , where G L ¯ = 1 n i = 1 n G L i i and s d ( G L ) is the standard deviation of G L 11 , , G L n n [21].
Based on the proposal of [39], and [32], the generalized leverage matrix for models with complete data takes the form governed by Equation (19):
G L Q ( θ ) = D θ ( Q ¨ ) 1 Q θ Y ,
such that
Q ¨ = Q ¨ ( θ | θ ^ ) = 2 Q ( θ | θ ^ ) / θ θ T ,
Q θ Y = 2 Q ( θ | θ ^ ) / θ Y T | θ = θ ^ = ( Q β Y T , Q ϕ Y T ) T , with
Q β Y = 2 Q ( θ | θ ^ ) / β Y T | θ = θ ^ = w ( δ ^ ) X T Σ ^ 1 and
Q ϕ Y = 2 Q ( θ | θ ^ ) / ϕ Y T , with elements,
Q ϕ j Y = 2 Q ( θ | θ ^ ) / ϕ j Y T | θ = θ ^ = w ( δ ^ ) ϵ ^ T Σ ^ 1 Σ ϕ j Σ ^ 1 ,
for j = 1 , 2 , with w ( δ ^ ) = 1 + n η η c ( η ) q ^ 1 , c ( η ) = η 1 2 η and q ^ = ( 1 + c ( η ) δ ^ ) . The ith response is potentially influential if G L Q i i > G L Q + 2 s d ( G L Q ) , where G L = 1 n i = 1 n G L Q i i and s d ( G L Q ) is the standard deviation of G L Q 11 , , G L Q n n .

3. Results

Application to Real Data Set

The data set was collected from a commercial agricultural area of 127.18 ha of grain production, located near the city of Cascavel, in the Western region of Paraná, Brazil. The latitude and longitude coordinates of the area are approximately 24.95° S and 53.57° W, with an average altitude of 650 m. According to the classification of Köppen, the climate of the region is type Cfa, and the soil was classified as Oxisol.
The response variable is the soybean productivity (Prod) [t ha−1], with the chemical contents of the soil considered explanatory variables—phosphorus (P) [mg dm−3], potassium (K) [cmolc dm−3], hydrogen potential (pH) and organic matter (OM) [g dm−3]. The linear spatial model for the soybean productivity at the site s i is given by Prod ( s i ) = β 1 + β 2 P ( s i ) + β 3 K ( s i ) + β 4 pH ( s i ) + β 5 OM ( s i ) + e ( s i ) , i = 1 , , 78 .
A brief descriptive analysis is presented in Table 1. It can be seen that the minimum production was 1.87 t ha−1, the maximum was 3.18 t ha−1 and the average soybean production was 2.37 t ha−1. This information is the first preliminary analysis that serves to identify and understand the data set.
The boxplot of soybean productivity presented in Figure 1a shows an outlier, corresponding to observation # 33 , which is the maximum value of the data with productivity equal to 3.176 t ha−1 (Table 1). According to the post-plot given in Figure 1b, the observation # 33 is surrounded by observations with a soybean productivity lower than 2.5 t ha−1.
Three outliers were observed in the boxplots of the explanatory variables. For the variable P, these were the observations # 62 , # 68 and # 69 with respective values of 50.3 , 52.5 and 58.6 mg dm−3. For the variable pH, the graph highlighted the observations # 56 , # 55 , # 14 , and # 4 with respective values of 5.8 , 6.0 , 6.0 , and 6.1 , and for the variable OM, only one outlier was detected—observation # 71 with 63.37 g dm−3. The analysis of the directional semivariogram, given in Figure 1c for directions 0°, 45°, 90°, and 135°, indicates that it is reasonable to assume isotropy, since the spatial dependence structure is similar in the constructed directions.
Table 2 presents the parameter estimates for the reparameterized t-Student linear model considering different values for the parameter η and different values of κ for the Matérn family of geostatistical models and asymptotic standard errors (in parentheses). The parameter ϕ 3 was fixed at 0.3 , as obtained from a previous analysis using the ordinary least squares method.
The chosen model is P ^ r o d ( s i ) = 1.9712 + 0.0030 P ( s i ) 0.0997 K ( s i ) + 0.0379 pH ( s i ) + 0.0040 OM ( s i ) , with an estimated covariance matrix given by Σ ^ = 0.0585 I 78 + 0.0857 R ( 0.3 ) , where the elements of the correlation matrix R determined from the Matérn family of models with κ = 0.5 and η = 0.25 were selected according to the cross-validation criterion ( C V ) and the trace ( T r ) of the criteria of the asymptotic covariance matrix.
In the diagnostic analysis of the response variable, shown in Figure 2 and Figure 3, observations # 17 , # 18 , # 33 , # 70 , # 71 , # 72 , # 76 and # 78 are highlighted as influential points by Cook’s D i distance, and observations # 18 , # 46 , # 70 , # 76 , and # 78 are highlighted as influential points by the Q D i distance of the Q-function.
Figure 4 and Figure 5 show the local influence graphs; observations # 33 and # 71 were considered influential points for plot C i vs. index, | L m a x | vs. index, and | Q m a x | vs. index. However, observations # 4 , # 14 , # 62 , and # 68 are considered influential by the plot C Q i . Observation # 33 is the same as the one detected in Figure 1a in the boxplot.
Figure 6 presents the graphs for the generalized leverage, which detected observations that had also been detected in the boxplots of the explanatory variables. The observations are # 62 , # 68 and # 69 for P, # 4 , # 14 , # 55 and # 56 for pH and # 71 for OM.
Figure 7 presents the QQ-plots of the residuals. It can be observed that when observation # 33 is removed, a better fit is detected at the top of the QQ-plot (Figure 7b), with most of the points closest to the line and all points belonging to the confidence interval in both graphs, showing that the data follow the assumed distribution.
Figure 8 presents the maps of predicted values considering all observations and deleting observation #33, which is evident in the substantial change on the map where observation #33 is located, see also Figure 1b. Thus, this observation is an outlier and an influential observation in obtaining the predicted values.
To measure the similarity between the reference map (all points) and the model map (without point #33), global accuracy, the Kappa, and Tau were determined with AG = 0.73, T = 0.64 and Kp = 0.59, suggesting a difference between thematic maps [40,41].

4. Discussion

To explain the average soybean productivity, a multiple spatial linear regression model was constructed, considering as explanatory variables the chemical contents of the soil, P, K, pH, and OM, with spatially correlated errors. The parameters estimated by maximum likelihood are presented in Table 2, where the best model chosen for the spatial dependence structure, considering CV criteria and the trait (Tr), was the Matérn model with the shape parameters κ = 0.5 (exponential model) and the shape parameter of the reparameterized t-Student distribution with η = 0.25 .
Point # 33 , considered an outlier (Figure 1), was globally and locally influential (Figure 2, Figure 4 and Figure 5) when removed showing that the QQ-plot of the residues (Figure 7) has a better behavior, indicating that this observation is also an influential point in the probability distribution of the residues, as [5] values that deviate considerably from the straight line indicate influential cases.
When comparing the map generated with all sample points and the map without points # 33 (Figure 8), it can be seen that the global accuracy indicator (AG) is less than 0.73, indicating that the maps are dissimilar [40]. According to the classification of similarity indices, Tau ( T = 0.64 ) and Kappa ( K p = 0.59 ) indicate that the maps have low similarity [40,41]. Thus, it is possible to verify that there is a difference in the classification of the constructed thematic maps with and without a point # 33 and with a point # 33 detected as influential in the elaboration of the map. The formulas for the indices A G , T, and K p are presented by [42].
The graphs of the generalized leverage indices LG and LG_Q (Figure 6 detected leverage points, indicating the existence of influential points in the explanatory variables and in the spatial linear regression model, affecting the estimates of the model parameters. Figure 6 detected observations # 62 , # 68 , and # 69 for phosphorus (P); observations # 4 , # 14 , # 55 , and # 56 for hydrogen potential (pH); and observation # 71 for organic matter (OM), with potassium (K) not being a leverage point.
With information from the productivity map and explanatory variables, differentiated management units (DMUs) can be created, detecting subareas in the agricultural property under study that have similar characteristics in terms of production potential, helping the farmer apply differentiated management strategies for each subarea.

5. Conclusions

The reparameterized t-Student distribution was presented in the geostatistical framework. The reparametrization enables a straightforward comparison between the covariance matrix of multivariate distributions, especially for comparison with the multivariate normal distributions. The study also used global and local influence diagnostics to choose a more appropriate model to fit the data.
The reparametrized t-Student linear spatial modeling allows for more robust modeling in the presence of influential observations. It is extremely important to detect the influential points and correlate their spatial location, in order to draaw valuable conclusions about whether to remove these points or not, defined by the analysis of diagnostics. Investigation of theses points is indispensable for making a decision, ensuring that the information contained in the created maps is more consistent with reality and may be used in precision agriculture for the creation of differentiated management units.

Author Contributions

Conceptualization, M.A.U.-O., R.C.S., F.D.B. and M.G.; methodology, M.A.U.-O., F.D.B. and M.G.; software, R.C.S., F.D.B., R.A.B.A. and T.C.M.; validation, M.A.U.-O., R.C.S., F.D.B., M.G., R.A.B.A. and T.C.M.; formal analysis, M.A.U.-O., R.C.S., F.D.B., M.G. and T.C.M.; investigation, M.A.U.-O., R.C.S., F.D.B., M.G., R.A.B.A. and T.C.M.; resources, M.A.U.-O., R.C.S., F.D.B., M.G. and T.C.M.; data curation, M.A.U.-O., R.C.S., F.D.B., M.G. and T.C.M.; writing—original draft preparation, M.A.U.-O., R.C.S., F.D.B., M.G. and T.C.M.; writing—review and editing, M.A.U.-O., R.C.S., F.D.B., M.G. and T.C.M.; visualization, M.A.U.-O., R.C.S., F.D.B., M.G. and T.C.M.; supervision, M.A.U.-O., R.C.S., F.D.B., M.G., R.A.B.A. and T.C.M.; project administration, M.A.U.-O., R.C.S., F.D.B., M.G. and T.C.M.; funding acquisition, M.A.U.-O., F.D.B., M.G. and T.C.M. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to thank the Coordination for the Improvement of Higher Education Personnel (CAPES), Financing Code 001, the Arauc´aria Foundation of the State of Paraná, and the National Council for Scientific and Technological Development (CNPq) for their financial support. The process numbers: 306561/2020-4, 310050/2019-7, 404872/2023-9, and 302413/2022-7, and Fundação de Amparo a Ciência e Tecnologia de Pernambuco (FACEPE).

Data Availability Statement

The data sets presented in this article are not readily available because the data belong to a group of researchers at the University and are currently part of ongoing studies by researchers in the field of spatiotemporal statistics.

Acknowledgments

The authors thank the Laboratory of Spatial-Temporal Statistics (LEE), UNIOESTE, Cascavel, PR, Brazil.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AGGlobal accuracy
CVcross-validation criterion
DMUsDifferentiated management units
KPotassium
κ Model parameter Matérn
KpKappa
MLMaximum likelihood
nNumber of observations
OMOrganic matter
PPhosphorus
pHHydrogen potential
ProdProductivity
TTau
TrTrace

References

  1. Galea, M.; Bolfarine, H.; Vilcalabra, F. Influence diagnostics for the structural errors-in-variables model under the Student-t distribution. J. Appl. Stat. 2002, 29, 1191–1204. [Google Scholar] [CrossRef]
  2. Galea, M.; de Castro, M. Robust inference in a linear functional model with replications using the t distribution. J. Multivar. Anal. 2017, 160, 134–145. [Google Scholar] [CrossRef]
  3. Martínez, S.; Giraldo, R.; Leiva, V. Birnbaum–Saunders functional regression models for spatial data. Stoch. Environ. Res. Risk Assess. 2019, 33, 1765–1780. [Google Scholar] [CrossRef]
  4. Ordoñez, J.A.; Prates, M.O.; Matos, L.A.; Lachos, V.H. Objective Bayesian analysis for spatial Student-t regression models. J. Spat. Sci. 2020. [Google Scholar] [CrossRef]
  5. Lange, K.L.; Little, R.J.A.; Taylor, J.M.G. Robust statistical modeling using the t distribution. JASA 1989, 84, 881–896. [Google Scholar] [CrossRef]
  6. Uribe-Opazo, M.A.; De Bastiani, F.; Galea, M.; Schemmer, R.C.; Assumpção, R.A.B. Appropriate perturbation scheme for the covariance matrix of a t-Student spatial linear model. Spat. Stat. 2021, 41, 100481. [Google Scholar] [CrossRef]
  7. Richetti, J.; Uribe-Opazo, M.A.; De Bastiani, F.; Johann, J.S. Techniques for detection of influencing points in regionalized continuous variables. Eng. Agríc. 2016, 36, 152–165. [Google Scholar] [CrossRef]
  8. Osorio, F.; Paula, G.A.; Galea, M. Assessment of local influence in elliptical linear models with longitudinal structure. Comput. Stat. Data Anal. 2007, 51, 4354–4368. [Google Scholar] [CrossRef]
  9. Uribe-Opazo, M.A.; Borssoi, J.A.; Galea, M. Influence diagnostics in Gaussian spatial linear models. J. Appl. Stat. 2012, 39, 615–630. [Google Scholar] [CrossRef]
  10. De Bastiani, F.; Galea, M.; Cysneiros, A.H.M.A.; Uribe-Opazo, M.A. Gaussian spatial linear models with repetitions: An application to soybean productivity. Spat. Stat. 2017, 21, 319–335. [Google Scholar] [CrossRef]
  11. De Bastiani, F.; Uribe-Opazo, M.A.; Galea, M.; Cysneiros, A.H.M.A. Case-deletion diagnostics for spatial linear mixed models. Spat. Stat. 2018, 28, 284–303. [Google Scholar] [CrossRef]
  12. Lachos, V.H.; Matos, L.A.; Barbosa, T.S.; Garay, A.M.; Dey, D.K. Influence diagnostics in spatial models with censored response. Environmetrics 2017, 28, e2464. [Google Scholar] [CrossRef]
  13. Zhu, H.; Lee, S.; Wei, B.; Zhou, J. Case-Deletion Measures for Models with Incomplete Data. Biometrika 2001, 88, 727–737. [Google Scholar] [CrossRef]
  14. Osorio, F. MVT: Estimation and Testing for the Multivariate t-Distribution. R Package Version 0.3-8. 2023. Available online: https://CRAN.R-project.org/package=MVT (accessed on 1 August 2024).
  15. Mardia, K.V.; Marshall, R.J. Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika 1984, 71, 135–146. [Google Scholar] [CrossRef]
  16. Stein, M.L. Interpolation of Spatial Data: Some Theory for Kriging; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; pp. 1–249. [Google Scholar]
  17. Acosta, J.; Osorio, F.; Vallejos, R. Effective sample size for line transect sampling models with an application to marine macroalgae. J. Agric. Biol. Environ. Stat. 2016, 21, 407–425. [Google Scholar] [CrossRef]
  18. Waller, L.A.; Gotway, C.A. Applied Spatial Statistics for Public Health Data; John Wiley & Sons: Hoboken, NJ, USA, 2004; p. 368. [Google Scholar]
  19. Magnus, J.R.; Neudecker, H. Matrix Differential Calculus with Applications in Statistics and Econometrics; John Wiley & Sons: Hoboken, NJ, USA, 2019; p. 479. [Google Scholar]
  20. Zellner, A. Bayesian and non-Bayesian analysis of the regression model with multivariate Student-t error terms. JASA 1976, 71, 400–405. [Google Scholar] [CrossRef]
  21. De Bastiani, F.; Cysneiros, A.H.L.A.; Uribe-Opazo, M.A.; Galea, M. Influence diagnostics in elliptical spatial linear models. Test 2015, 4, 322–340. [Google Scholar] [CrossRef]
  22. Kano, Y.; Berkane, M.A.; Bentler, M. Statistical inference based on pseudo-maximum likelihood estimators in elliptical populations. JASA 1993, 88, 135–143. [Google Scholar]
  23. Matérn, B. Spatial Variation; Lecture Notes in Statistics; Springer: Berlin/Heidelberg, Germany, 1986; Volume 36. [Google Scholar]
  24. Dalposso, G.H.; Uribe-Opazo, M.A.; Johann, J.A.; Galea, M.; De Bastiani, F. Gaussian spatial linear model of soybean yield using bootstrap methods. Eng. Agríc. 2018, 38, 110–116. [Google Scholar] [CrossRef]
  25. Almeida, A.; Loy, A.; Hofnann, H. ggplo2 compatible quanttile-quantile plots in R. R J. 2018, 10, 248–261. [Google Scholar]
  26. Wickham, H. ggplot2. Wiley Interdiciplinary Rev. Stat. 2011, 3, 180–185. [Google Scholar] [CrossRef]
  27. Cook, R.D.; Weisberg, S. Residuals and Influence in Regression; Chapman and Hall: New York, NY, USA, 1982. [Google Scholar]
  28. Chatterjee, S.; Hadi, A.S. Sensitivity Analysis in Linear Regression; John Wiley & Sons: Hoboken, NJ, USA, 1986; p. 327. [Google Scholar]
  29. Christensen, R.; Johnson, W.; Pearson, L.M. Brediction diagnostics for spatial linear models. Biometrika 1992, 79, 583–591. [Google Scholar] [CrossRef]
  30. Pan, J.; Fei, Y.; Foster, P. Case-deletion diagnostics for linear mixed models. Technometrics 2014, 56, 269–281. [Google Scholar] [CrossRef]
  31. Cook, R.D. Detection of influential observation in linear regression. Technometrics 1977, 19, 15–18. [Google Scholar] [CrossRef]
  32. Zhu, H.; Lee, S. Local influence for incomplete data models. J. R. Stat. Soc. B Stat. Methodol. 2001, 63, 11–126. [Google Scholar] [CrossRef]
  33. Cook, R.D. Assessment of local influence. J. R. Stat. Soc. B Methodol. 1986, 48, 133–155. [Google Scholar] [CrossRef]
  34. Zhu, H.; Ibrahim, J.G.; Lee, S.; Zhang, H. Perturbation selection and influence measures in local influence analysis. Ann. Stat. 2007, 35, 2565–2588. [Google Scholar] [CrossRef]
  35. Verbeke, G.; Molenberghs, G. Linear Mixed Models for Longitudinal Data; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  36. Poon, W.Y.; Poon, Y.S. Conformal normal curvature and assessment of local influence. Stat. Methol. 1999, 61, 51–61. [Google Scholar] [CrossRef]
  37. Hoaglin, D.C.; Welsch, R.E. The hat matrix in regression and ANOVA. Ann. Stat. 1978, 32, 17–22. [Google Scholar] [CrossRef]
  38. St Laurent, R.T.; Cook, R.D. Leverage and superleverage in nonlinear regression. JASA 1992, 87, 985–990. [Google Scholar] [CrossRef]
  39. Wei, X.; Samarabandu, J.; Devdhar, R.S.; Siegel, A.J.; Acharya, R.; Berezney, R. Segregation of transcription and replication sites into higher order domains. Science 1998, 281, 1502–1505. [Google Scholar] [CrossRef] [PubMed]
  40. Anderson, J.R. A Land Use and Land Cover Classification System for Use with Remote Sensor Data; US Government Printing Office: Washinton, DC, USA, 1976; p. 964.
  41. Krippendorff, K. Content Analysis: An Introduction to Its Methodology; Sage Publications: Thousand Oaks, CA, USA, 1980. [Google Scholar]
  42. De Bastiani, F.; Uribe-Opazo, M.A.; Dalposso, G.H. Comparison of maps of spatial variability of soil resistance to penetration constructed with and without covariables using a spatial linear model. Eng. Agríc. 2012, 32, 393–404. [Google Scholar] [CrossRef]
Figure 1. Boxplot (a), post-plot (b) and directional semivariogram (c) for the soybean productivity data set (t ha−1).
Figure 1. Boxplot (a), post-plot (b) and directional semivariogram (c) for the soybean productivity data set (t ha−1).
Mathematics 13 03035 g001
Figure 2. Global influence diagnostic plots D i β (a) and D i ϕ (b).
Figure 2. Global influence diagnostic plots D i β (a) and D i ϕ (b).
Mathematics 13 03035 g002
Figure 3. Global influence diagnostics plots Q D i β (a) and Q D i ϕ (b).
Figure 3. Global influence diagnostics plots Q D i β (a) and Q D i ϕ (b).
Mathematics 13 03035 g003
Figure 4. Local influence diagnostic plots C i (a) and l m a x (b).
Figure 4. Local influence diagnostic plots C i (a) and l m a x (b).
Mathematics 13 03035 g004
Figure 5. Local influence diagnostic plots C Q i (a) and Q m a x (b).
Figure 5. Local influence diagnostic plots C Q i (a) and Q m a x (b).
Mathematics 13 03035 g005
Figure 6. Generalized leverage plots considering the (a) log-likelihood function (LG) and (b) the Q-function ( L G Q ) .
Figure 6. Generalized leverage plots considering the (a) log-likelihood function (LG) and (b) the Q-function ( L G Q ) .
Mathematics 13 03035 g006
Figure 7. QQ-plots of the residuals (a) with all points and (b) without point #33.
Figure 7. QQ-plots of the residuals (a) with all points and (b) without point #33.
Mathematics 13 03035 g007
Figure 8. Maps for the data set (a) with all observations and (b) deleting observation #33.
Figure 8. Maps for the data set (a) with all observations and (b) deleting observation #33.
Mathematics 13 03035 g008
Table 1. Descriptive statistics to the variables productivity (Prod), Phosphorus (P), Potassium (K), hydrogen potential (pH), and organic matter (OM).
Table 1. Descriptive statistics to the variables productivity (Prod), Phosphorus (P), Potassium (K), hydrogen potential (pH), and organic matter (OM).
ProdPKpHOM
n7878787878
Average2.3719.190.314.8250.63
Minimum1.873.400.104.2038.62
Maximum3.1858.600.676.1066.37
Median2.3316.900.284.7550.81
n: number of observations.
Table 2. Parameters estimates and asymptotic standard errors in parenthesis.
Table 2. Parameters estimates and asymptotic standard errors in parenthesis.
κ η β ^ 1 β ^ 2 β ^ 3 β ^ 4 β ^ 5 ϕ ^ 1 ϕ ^ 2
0.5 0.05 1.9712 0.0030 0.0997 0.0379 0.0040 0.0330 0.0482
( 0.4227 ) ( 0.0028 ) ( 0.2269 ) ( 0.0752 ) ( 0.0055 ) ( 0.0428 ) ( 0.0491 )
0.10 1.9712 0.0030 0.0997 0.0379 0.0040 0.0368 0.0540
( 0.4217 ) ( 0.0028 ) ( 0.2263 ) ( 0.0750 ) ( 0.0054 ) ( 0.0493 ) ( 0.0576 )
0.25 1.9712 0.0030 0.0997 0.0379 0.0040 0.0585 0.0857
( 0.4205 ) ( 0.0028 ) ( 0.2257 ) ( 0.0748 ) ( 0.0054 ) ( 0.0849 ) ( 0.1032 )
0.45 1.9712 0.0030 0.0997 0.0379 0.0040 0.2960 0.4336
( 0.4231 ) ( 0.0029 ) ( 0.2271 ) ( 0.0753 ) ( 0.0055 ) ( 0.4693 ) ( 0.5915 )
1.0 0.05 2.0125 0.0037 0.1263 0.0425 0.0025 0.1362 0.0113
( 0.5684 ) ( 0.0039 ) ( 0.3156 ) ( 0.1043 ) ( 0.0073 ) ( 0.0971 ) ( 0.0847 )
0.10 2.0125 0.0037 0.1263 0.0425 0.0025 0.2102 0.0174
( 0.6664 ) ( 0.0046 ) ( 0.3700 ) ( 0.1223 ) ( 0.0085 ) ( 0.1644 ) ( 0.1309 )
0.25 2.0125 0.0037 0.1263 0.0425 0.0025 0.6101 0.0504
( 0.8984 ) ( 0.0062 ) ( 0.4988 ) ( 0.1649 ) ( 0.0115 ) ( 0.5852 ) ( 0.3813 )
0.45 2.0125 0.0037 0.1263 0.0425 0.0025 4.8771 0.4032
( 1.1362 ) ( 0.0078 ) ( 0.6308 ) ( 0.2085 ) ( 0.0146 ) ( 5.6256 ) ( 3.0600 )
1.5 0.05 2.0118 0.0037 0.1258 0.0425 0.0025 0.3658 0.1284
( 1.0406 ) ( 0.0072 ) ( 0.5774 ) ( 0.1909 ) ( 0.0133 ) ( 0.8091 ) ( 0.8023 )
0.10 2.0118 0.0037 0.1258 0.0425 0.0025 0.6869 0.2411
( 1.3459 ) ( 0.0093 ) ( 0.7468 ) ( 0.2469 ) ( 0.0173 ) ( 1.5368 ) ( 1.5100 )
0.25 2.0118 0.0037 0.1258 0.0425 0.0025 2.4208 0.8496
( 1.9992 ) ( 0.0137 ) ( 1.1093 ) ( 0.3667 ) ( 0.0256 ) ( 5.5842 ) ( 5.3471 )
0.45 2.0118 0.0037 0.1258 0.0425 0.0025 20.9157 7.3408
( 2.6287 ) ( 0.0181 ) ( 1.4586 ) ( 0.4822 ) ( 0.0337 ) ( 50.0842 ) ( 46.4495 )
κ : kappa value of the Matérn model; η : parameter of form in the reparameterized t-Student distribution; β ^ j : parameters estimates, ϕ ^ i : parameters estimates of the covariance matrix, i = 1 , 2 , j = 1 , 2 , 3 , 4 , 5 , where ϕ 1 = τ 1 and ϕ 2 = τ 2 / τ 3 2 κ , where τ 1 is effect nugget and τ 2 the contribution.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Uribe-Opazo, M.A.; Schemmer, R.C.; De Bastiani, F.; Galea, M.; Assumpção, R.A.B.; Maltauro, T.C. Q-Function-Based Diagnostic and Spatial Dependence in Reparametrized t-Student Linear Model. Mathematics 2025, 13, 3035. https://doi.org/10.3390/math13183035

AMA Style

Uribe-Opazo MA, Schemmer RC, De Bastiani F, Galea M, Assumpção RAB, Maltauro TC. Q-Function-Based Diagnostic and Spatial Dependence in Reparametrized t-Student Linear Model. Mathematics. 2025; 13(18):3035. https://doi.org/10.3390/math13183035

Chicago/Turabian Style

Uribe-Opazo, Miguel A., Rosangela C. Schemmer, Fernanda De Bastiani, Manuel Galea, Rosangela A. B. Assumpção, and Tamara C. Maltauro. 2025. "Q-Function-Based Diagnostic and Spatial Dependence in Reparametrized t-Student Linear Model" Mathematics 13, no. 18: 3035. https://doi.org/10.3390/math13183035

APA Style

Uribe-Opazo, M. A., Schemmer, R. C., De Bastiani, F., Galea, M., Assumpção, R. A. B., & Maltauro, T. C. (2025). Q-Function-Based Diagnostic and Spatial Dependence in Reparametrized t-Student Linear Model. Mathematics, 13(18), 3035. https://doi.org/10.3390/math13183035

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop