Next Article in Journal
Artificial Symmetries for Calculating Vibrational Energies of Linear Molecules
Previous Article in Journal
An Investigation of Social Distancing and Quantity of Luggage Impacts on the Three Groups Reverse Pyramid Boarding Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fernandez–Steel Skew Normal Conditional Autoregressive (FSSN CAR) Model in Stan for Spatial Data

Department of Statistics, Faculty of Science and Data Analytics, Institute Teknologi Sepuluh Nopember, Jl. Arif Rahman Hakim, Surabaya 60111, Indonesia
*
Author to whom correspondence should be addressed.
Symmetry 2021, 13(4), 545; https://doi.org/10.3390/sym13040545
Submission received: 9 February 2021 / Revised: 21 March 2021 / Accepted: 23 March 2021 / Published: 26 March 2021
(This article belongs to the Section Mathematics)

Abstract

:
In spatial data analysis, the prior conditional autoregressive (CAR) model is used to express the spatial dependence on random effects from adjacent regions. This paper provides a new proposed approach regarding the development of the existing normal CAR model into a more flexible, Fernandez–Steel skew normal (FSSN) CAR model. This approach is able to capture spatial random effects that have both symmetrical and asymmetrical patterns. The FSSN CAR model is built on the basis of the normal CAR with an additional skew parameter. The FSSN distribution is able to provide good estimates for symmetry with heavy- or light-tailed and skewed-right and skewed-left data. The effects of this approach are demonstrated by establishing the FSSN distribution and FSSN CAR model in spatial data using Stan language. On the basis of the plot of the estimation results and histogram of the model error, the FSSN CAR model was shown to behave better than both models without a spatial effect and with the normal CAR model. Moreover, the smallest widely applicable information criterion (WAIC) and leave-one-out (LOO) statistical values also validate the model, as FSSN CAR is shown to be the best model used.

1. Introduction

Spatial analysis is one of the analytic approaches that considers the important aspects of spatial data; that is, data indicated by spatial effects. In general, the figures for spatial data can be estimated using various modeling methods. However, if the spatial effects in the spatial data are not considered, then the estimated model will be imprecise. Data indicating the existence of neighborhood effects, therefore, is very important to analyze spatially and to show that one region affects other regions. For example, it is very important for observed disease or virus data to be considered as spatial data and to be analyzed spatially. This is because diseases or viruses are naturally spread very easily, especially in areas that are close together. Spatial factors are divided into two categories: the geostatistical approach and the lattice approach. In this paper, the lattice approach is used, which means that the spatial influence is set by neighboring regions [1]. The inclusion of spatial lattice effects is added as a random effects component in the model that is not explained in the model without random effects [2]. If spatial effects are not included in the modeling, the results obtained still contain spatial autocorrelation, so that the magnitude of the risk effect between neighboring regions is unknown [2]. On the basis of the research conducted by Rantini et al. [3], the survival model coupled with the normal conditional autoregressive (CAR) spatial effect is considered more representative in modeling correlated spatial data than the survival model without spatial random effects. The spatial dependencies of random effects between regions that are close together are expressed through the prior conditional autoregressive (CAR) model [2,4].
To date, many studies have analyzed spatial data. Research into spatial effects on survival data has been conducted by Banerjee et al. [1]. Their study modeled infant mortality data in Minnesota by employing the Weibull distribution with the normal CAR spatial effect. A similar study was also conducted by Darmofal [2] applying the Weibull distribution on political cases coupled with the normal CAR spatial effect. The Weibull distribution with the normal CAR spatial effect has also been implemented to analyze dengue hemorrhagic fever (DHF) by Iriawan et al. [5]. Another work of DHF data research that also used spatial CAR was presented by Aswi et al. [6]. In their research, the Weibull survival model was given three different prior CAR models: Leroux prior, intrinsic conditionally autoregressive (ICAR) or normal CAR prior, and independent prior models. The other study that used a non-normal CAR model was presented by Motarjem et al. [7]. Their research succeeded in demonstrating a model for asthma data in Tehran by using Weibull distribution coupled with the normal CAR spatial effect, and they compared the result with a non-normal CAR model (the non-normal CAR model used was the double-exponential (DE) CAR model). Model comparison using the normal CAR and DE CAR has also been carried out by Rantini and colleagues [8,9]. Their research demonstrated DHF data modeling in the Pamekasan district, East Java, Indonesia and in the eastern Surabaya, East Java, Indonesia, respectively.
There are several methods for estimating parameters of models. In the Bayesian perspective, one of them is the Hamiltonian Monte Carlo (HMC), which is one of the algorithms of the Markov chain Monte Carlo (MCMC) method, like the Gibbs sampling and Metropolis–Hastings algorithm [10,11,12]. HMC provides a powerful MCMC sampling algorithm [10,13]. There are some advantages in using HMC. Firstly, HMC can converge to the posterior probability density with smaller MCMC samples than required by the derivative-free Metropolis–Hastings algorithm [14]. This is because HMC uses numerical integration schemes by developing the entire system in parallel, consisting of small steps in order to avoid the non-locality problems [11]. Secondly, HMC provides an ergodic Markov chain with high-probability acceptance, even in large transitions [15,16,17]. Thirdly, HMC allows for a more efficient exploration of state space than standard random-walk proposals, due to its high-probability acceptance proposal determination mechanism within the Metropolis–Hastings framework [13]. Fourthly, HMC has better performance compared to the Metropolis or Gibbs samplers, because HMC is able to manage the proposed transitions in the Markov chain that lie far apart in the sampling space [12].
Meanwhile, only spatial modeling with a symmetrical CAR model—i.e., normal CAR and double-exponential (DE) CAR—has been provided in software such as Bayesian inference using Gibbs sampling (BUGS) software [18]. In the BUGS software, “car.normal” and “car.l1” are the functions for the normal CAR and DE CAR model, respectively [18]. An innovation for relaxation of the symmetrical CAR model should be developed by researchers by creating short subprograms as an additional user-defined utility. An open-source programming language, designed as probabilistic programming for Bayesian analysis, is called Stan language, which uses HMC and can be widely used to help in the realization of the relaxation of the symmetrical CAR model. It is rivaling the use of BUGS and the just another Gibbs sampler (JAGS) software that uses MCMC [19]. The efficiency of HMC in achieving inference is faster than the MCMC used in BUGS and JAGS software [19]. In the Stan language, facilities are provided for users to create additional user-defined distributions according to their own programming written in Stan code [20]. This presents a great opportunity for researchers to be more creative in their data-driven analysis. The process of adding a distribution or model as a user-defined in Stan is easier when compared to BUGS and JAGS software. In the research conducted by Wetzels et al. [21], it was found that adding a new distribution to the BUGS program has quite complex steps and requires another program, namely the BlackBox Component Builder. As the BUGS software, adding a new distribution in JAGS is also complicated, which can be seen in research conducted by Wabersich et al. [22]. Stan has its own programming language for defining and adding statistical models [20]. Stan modeling language can be learned through several interfaces in several software. Some of Stan’s interfaces are RStan (R), PyStan (Python), CmdStan (shell, command-line terminal), CmdStanR (R, lightweight wrapper for CmdStan), CmdStanPy (Python, lightweight wrapper for CmdStan), MatlabStan (MATLAB), Stan.jl (Julia), StataStan (Stata), MathematicaStan (Mathematica), ScalaStan (Scala) (see the interfaces in https://mc-stan.org/users/interfaces/). Among these Stan interfaces, R (RStan) and Python (PyStan) are the most popularly used [20]. With Stan’s interface in several software tools, the researcher can generalize the proposed method. In this study, we used RStan.
Considering that there have been many studies using the normal CAR model, Stan provides researchers with a new opportunity to conduct data-driven analysis by incorporating the concept of spatial effect modeling with a non-normal CAR model. Starting with Motarjem et al. [7], using the DE CAR model, researchers could develop another, more flexible non-normal CAR model. In the research by Rantini et al. [8], it was shown that the DE CAR model is no better than the normal CAR model. This research provided evidence that the DE CAR model is not always more robust than the normal CAR, as stated by Motarjem et al. [7]. This is one of the reasons for the proposed Fernandez–Steel skew normal (FSSN) CAR model. Skewness is one of the features of skew-symmetric distributions [23]. Compared to the normal and DE distributions, which can only capture data that have a symmetrical pattern, FSSN distribution is able to be more flexible when explaining both symmetrical and asymmetrical data [24]. In a study conducted by Castillo, et al. [24], who modeled volcanic data with less symmetrical histograms, the FSSN distribution approach would be more favorable than the normal distribution. This fact supports our proposed study, since the distribution of spatial effects is not always symmetrical. One of the advantages of the FSSN distribution over the skew-normal (SN) distribution by Azzalini [25], is that its Fisher information matrix does not have a singularity problem, as occurs with the corresponding Fisher information matrix of the SN distribution [24].
In real-world problems, the size of the data available is not always large enough and it becomes a challenge. This can affect the variation of model parameters [26]. To solve this problem, a fairly strong simulation is suggested, especially through the Bayesian approach [27]. This approach is claimed as a suitable method for data-driven applications [28]. In statistical theory, frequentist approaches are only better for modeling large amounts of data. Bayesian approaches, on the other hand, can be used not only for large data sizes, but also for limited data sizes [26]. To do so, Bayesian emphasizes the accurate choice of priors and simulates for the model. The Monte Carlo simulation becomes effective in dealing with this problem [28,29]. Regarding dependence modeling for small data, Zhang and Shields [30] demonstrated that when the Gaussian dependence assumption is applied, biased estimation results are obtained when the dependence structure deviates from this assumption. The proposed copula dependence was employed to solve their problem. Handling the small data in this study, we proposed to use a simulation with HMC, which is applied to the distribution of non-normal data, namely the FSSN distribution. At the end of this study, it was used on data with spatial dependence.
The aim of this study is to show the new creation of the user-defined FSSN distribution and the FSSN CAR model in Stan and to demonstrate their flexibility to explain the distribution of spatial effects adaptively. The latter exhibits the ability and adaptability to model symmetrical and asymmetrical spatial data patterns. A more mathematical and in-depth explanation of the FSSN distribution can be seen in Fernandez and Steel [31].
This paper is organized as follows. Section 2 introduces the CAR model in general and its mathematical explanation. Section 3 describes the intrinsic conditional autoregressive (ICAR) or normal CAR model. The Stan code for the normal CAR model can be seen in Morris [32]. Further explanation of the normal CAR model derived from the mathematical calculation and applied to the Stan code is given in Appendix A. Section 4 describes the FSSN distribution and the FSSN CAR model and demonstrates the Stan code according to its mathematical description. Section 5 provides several scenarios for simulation studies on univariate and multivariate distributions. Section 6 contains the application and comparison of the normal CAR and FSSN CAR models using the Scotland lip cancer dataset and lung cancer dataset from the London Health Authority. The conclusions are given in Section 7.

2. Conditional Autoregressive (CAR) Model

The area data represents objects defined in terms of geometric features, such as points, lines, polygons, regions, and volumes. The regions are partitioned into a limited number of subregions with clear boundaries. The area data, consisting of a single aggregate size per unit area, could have binary, count, or continuous values. These values can be modeled using the CAR model. The CAR model calculates the proximity between neighboring areas that are close together. According to Besag [33], the area data with a spatial structure show that the neighborhood regions have a higher correlation than those that are far away from each other. Area data are different from point data, which consist of measurements from known geospatial points. While the relationship between regions is given in terms of proximity, the relationship between two regional data points is explained by the unit of distance.
Spatial interactions between a pair of areas s i and s j in the given set of observations taken at n different areas of a region can be modeled conditionally as a spatial random variable ϕ , which is an n - length vector ϕ = ( ϕ 1 , , ϕ n ) T . In CAR models, the spatial relationship between the number of n areas is represented as an adjacency matrix W with dimensions n × n . Each component entry of w i , j and w j , i is positive when the areas s i and s j are neighbors, and is zero otherwise. The neighbor’s relationship, written as s i s j , is defined in terms of this matrix; i.e., the neighbors of the area s i are those areas that have non-zero entries in a row or column i . The conditional distribution for each ϕ i is specified in terms of a mean and precision parameter τ , and can be written as follows [33].
p Normal ( ϕ i | ϕ j , j i , τ i 1 ) = N ( α s i s j w i , j ϕ j , τ i 1 ) , i , j = 1 , 2 , , n ,
where the parameter α controls the strength of the spatial association—when α = 0 , this corresponds to spatial independence—and n is the number of areas in a region.
The corresponding joint distribution can be uniquely determined from the set of full conditional distributions by introducing a fixed point from the support of p Normal ( . ) . The random vector ϕ has a multivariate normal standard distribution, and precision parameters are formed from two matrices that describe the neighborhood of n areas; i.e., the diagonal matrix d and the adjacency matrix W , as written in Equation (2) [34].
ϕ N n ( 0 , [ D τ ( I α W ) ] 1 )
where N n denotes the n-dimensional normal distribution; α is between 0 and 1; d is an n × n diagonal matrix, where each diagonal entry d i , i contains the number of neighbors of the area s i , and all off-diagonal entries are zero; W is the adjacency matrix, where entry is w i , j = 1 if the areas s i and s j are neighbors and w i , j = 0 otherwise, and all diagonal entries w i , i are zero; and I is an n × n identity matrix.

3. Intrinsic Conditional Autoregressive (ICAR) Model

An intrinsic conditional autoregressive (ICAR) or normal CAR model is a CAR model in which α in Equation (1) is equal to 1. The corresponding conditional distribution specification is expressed as in Equation (3) [32].
p Normal ( ϕ i | ϕ j , j i , τ i 1 ) = N ( i j ϕ i d i , i , 1 d i , i τ i ) .
The individual spatial random variable ϕ i for s i is normally distributed with a mean equal to the average of its neighbors. Its variance decreases as the number of neighbors increases. The joint distribution can be simplified as shown in Equation (4):
ϕ N n ( 0 , [ τ ( D W ) ] 1 ) ,
which rewrites the pairwise difference as shown in Equation (5) (these explanations are given in Appendix A).
p Normal ( ϕ | τ ) exp { τ 2 s i s j ( ϕ i ϕ j ) 2 } .
A full discussion of ICAR in Stan has been conducted by Morris et al. [32]. In this study, we briefly mention the ICAR or normal CAR model in Stan as an initial introduction.

4. Fernandez–Steel Skew Normal Conditionally Autoregressive (FSSN CAR) Model

Let ϕ FSSN ( μ , σ 2 , δ ) with location, scale, and skewness parameters < μ < , σ 2 > 0 , and δ > 0 , respectively. The probability density function (p.d.f) is given in Equation (6).
p FSSN ( ϕ | μ , σ 2 , δ ) = { 2 δ [ 1 + δ 2 ] σ g ( δ [ ϕ μ ] σ ) , if   ϕ < μ , 2 δ [ 1 + δ 2 ] σ g ( [ ϕ μ ] δ σ ) , if   ϕ μ ,
where g and G denote the standard normal p.d.f and cumulative distribution function (CDF), respectively [24]. Let ϕ FSSN ( μ , σ 2 , δ ) , with the mean and variance given by
E [ ϕ ] = μ + 2 σ ( δ 2 1 ) π δ and Var ( ϕ ) = σ 2 { ( π 2 ) δ 6 + 2 δ 2 ( δ 2 + 1 ) + π 2 } π δ 2 ( 1 + δ 2 ) , respectively [24]. Then, we have relationship between μ and the mean, as well as σ 2 and variance, as
μ = E [ ϕ ] 2 σ ( δ 2 1 ) π δ and σ 2 = π δ 2 ( 1 + δ 2 ) Var ( ϕ ) ( π 2 ) δ 6 + 2 δ 2 ( δ 2 + 1 ) + π 2 , respectively.
Equation (6) can be written as follows:
p FSSN ( ϕ | μ , σ 2 , δ ) = 2 δ [ 1 + δ 2 ] 1 2 π σ 2 exp [ ( 1 2 ( ( ϕ μ ) σ ) 2 ) { δ 2 I [ ϕ < μ ] ( ϕ ) + 1 δ 2 I [ ϕ μ ] ( ϕ ) } ] .
The construction of multivariate skewed distributions is based on linear transformations of univariate skewed distributions [35]. Let n be the dimension of the spatial random variable ϕ = ( ϕ 1 , , ϕ n ) T R n , with mean vector μ = ( μ 1 , μ 2 , , μ n ) T R n , the n × n positive definite variance-covariance matrix Σ , and skewed parameters vector δ = ( δ 1 , δ 2 , , δ n ) T R + n ; the multivariate FSSN p.d.f can be written as follows [35]:
p FSSN ( ϕ | μ , Σ , δ ) = i = 1 n p F S S N ( ϕ i | μ i , σ i 2 , δ i )
where each p FSSN ( ϕ i | μ i , σ i 2 , δ i ) is as in Equation (7). The p.d.f of the multivariate FSSN, therefore, can be written as follows:
p FSSN ( ϕ | μ , , δ ) = i = 1 n p F S S N ( ϕ i | μ i , σ i 2 , δ i )   = i = 1 n 2 δ i [ 1 + δ i 2 ] 1 2 π σ i 2 exp [ ( 1 2 ( ( ϕ i μ i ) σ i ) 2 ) { δ i 2 I [ ϕ i < μ i ] ( ϕ i ) + 1 δ i 2 I [ ϕ i μ i ] ( ϕ i ) } ] .
The individual spatial random variable ϕ i for s i has an FSSN distribution with a mean and variance equal to the normal distribution in Equation (3) and a skew parameter δ > 0 . The joint distribution can be simplified as in Equation (10):
ϕ FSSN n ( 0 , [ τ ( D W ) ] 1 , δ )
where we set τ = 1 , so ϕ FSSN n ( 0 , [ D W ] 1 , δ ) ; thus, we obtain a p.d.f as in Equation (11):
p FSSN ( ϕ | 0 , [ D W ] 1 , δ ) = i = 1 n p F S S N ( ϕ i | 0 , ( d i , i w i , i ) 1 , δ i )   = i = 1 n ( 2 δ i 1 + δ i 2 ) 1 2 π ( d i , i w i , i ) 1 Q i
where Q i = exp [ ( 1 2 ϕ i 2 ( d i , i w i , i ) 1 ) { δ i 2 I [ ϕ i < 0 ] ( ϕ i ) + 1 δ i 2 I [ ϕ i 0 ] ( ϕ i ) } ] . Since 1 2 π ( d i , i w i , i ) 1 is a constant, Equation (11) can be rewritten in a proportional form as follows:
p FSSN ( ϕ | 0 , [ D W ] 1 , δ ) i = 1 n ( 2 δ i 1 + δ i 2 ) exp [ ( 1 2 ϕ i 2 ( d i , i w i , i ) 1 ) { δ i 2 I [ ϕ i < 0 ] ( ϕ i ) + 1 δ i 2 I [ ϕ i 0 ] ( ϕ i ) } ] ,
with explanations analogous to Appendix A. In logarithmic form, the pairwise difference formulation can be expressed as in Equation (13):
log ( p FSSN ( ϕ | 0 , [ D W ] 1 , δ ) ) = i = 1 n ( log 2 + log δ i log ( 1 + δ i 2 ) 1 2 ( ϕ i 2 ( d i , i w i , i ) 1 ) { δ i 2 I [ ϕ i < 0 ] ( ϕ i ) + 1 δ i 2 I [ ϕ i 0 ] ( ϕ i ) } )   = i = 1 n ( log 2 + log δ i log ( 1 + δ i 2 ) 1 2 ( ϕ i ϕ j ) 2 { δ i 2 I [ ϕ i < 0 ] ( ϕ i ) + 1 δ i 2 I [ ϕ i 0 ] ( ϕ i ) } )
where, for s i s j , it means that s i neighbors s j and i j .

4.1. Adding FSSN Distribution in Stan

As stated in the Introduction, based on Wetzels et al. [21], adding a new distribution to the BUGS software is extremely complicated and requires another specialized program, namely the BlackBox Component Builder. Likewise in the JAGS software, adding a new distribution also has difficult steps, as stated by Wabersich et al. [22]. Unlike the BUGS and JAGS programs, Stan makes it easy for its users to add new distributions. By knowing the mathematical form of the distribution to be added, the steps added in Stan can be written according to the mathematical calculation steps of the distribution. This convenience provides an advantage for researchers to add new distributions, such as the CAR model. The addition of custom distribution in Stan has already been explained by Annis et al. [20]. Therefore, Stan was chosen as an implementation method for spatial modeling involving adaptive distribution for FSSN. Based on Equation (7), the addition of the user-defined FSSN distribution Stan code can be seen in Listing 1.
Listing 1. A user-defined Stan code of Fernandez–Steel skew normal (FSSN) distribution.
 functions{
   real FSSN_lpdf(real x, real mu, real sigma, real delta){
     real logpdf;
     real z;
     real delta2;
     delta2=delta*delta;
     if(sigma<=0)
       reject("sigma <= 0; found sigma =", sigma);
     if(delta<=0)
       reject("delta<=0; found delta =", delta);
     z=(xmu);
     if(x<mu){
       logpdf=normal_lpdf(delta*z|0,sigma)+log(2)+log(delta)-log1p(delta2);
     }
     else{
       logpdf=normal_lpdf(z/delta|0,sigma)+log(2)+log(delta)-log1p(delta2);
     }
     return logpdf;
   }
 }

4.2. Adding the FSSN CAR Model in Stan

Additions to the normal CAR or ICAR model in Stan can be seen in Morris et al. [32]. In this study, the proposed user-defined Stan code for the FSSN CAR model was created. Based on Equation (13), the Stan code for the FSSN CAR model can be seen in Listing 2.
Listing 2. A user-defined Stan code for the FSSN conditional autoregressive (CAR) model.
real car_FSSN_lpdf(vector phi, int N, int[] node1, int[] node2, vector delta){
  vector [N] logpdf;
  vector [N] delta2
  vector [N] phi2;
  for(i in 1:N){
  delta2[i]=delta[i]*delta[i];
  phi2[i]=(phi[node1][i]- phi[node2][i])*(phi[node1][i]- phi[node2][i]);
  if(delta[i]<=0)
  reject("delta[i]<=0; found delta[i] =", delta[i])
  }
  for(i in 1:N){
  if(phi[i]<0)
  logpdf[i]=log(2)+log(delta[i])-log1p(delta2[i])-0.5*delta2[i]*phi2[i]
  +FSSN_lpdf(sum(phi)|0,0.001*N,mean(delta));
  else
  logpdf[i]=log(2)+log(delta[i])-log1p(delta2[i])-0.5*(1/delta2[i])*phi2[i]
  +FSSN_lpdf(sum(phi)|0,0.001*N,mean(delta));
  }
  return sum(logpdf);
  }

5. Simulation Study

5.1. Simulation for Univariate Distribution

From Equation (7), it can be seen that, when the value of the skewness parameter δ = 1 , the FSSN distribution exhibits a normal distribution, with the parameters μ and σ 2 being the same with those of the FSSN distribution. Thus, to be able to validate the user-defined FSSN distribution in the Stan program, data are generated from a normal distribution ( μ , σ 2 ), giving Stan the opportunity to perform an estimation using the user-defined FSSN. The proof of the validity of FSSN in terms of detecting normal data is that it must be able to estimate μ and σ 2 correctly in accordance with the generator parameters. Another important factor is the user-defined FSSN must also provide an estimate of δ = 1 , which means the FSSN must also state that the estimated data exhibit symmetry.
Next, we provide evidence of the application of the FSSN CAR model in Stan. Here, we created 24 scenarios that involved normal, DE, and Student-t distributions. These three distributions are already built-in utilities in Stan, and were proposed to be estimated using FSSN distribution as the user-defined Stan code. These scenarios contain a difficult scheme, which was used to test the ability of the FSSN to detect data with very different variances and zero-centered data with extreme leptokurtic properties. Each scenario was designed using sample sizes of 125, 250, 500, and 1000, generated from normal, DE, and Student-t distributions. Each scenario was replicated 500 times.
The parameter estimation was done by utilizing the HMC facility in Stan through 4 chains and 10,000 iterations. The 24 scenarios for the normal, DE, and Student-t distributions are displayed in Table 1.
From the 24 scenarios above, we obtained the highest posterior density (HPD) interval for each parameter as seen in Table 2. To complete the parameter estimation results that were carried out on the simulation data with the 500 replications above, the bias, the root-mean-squared error (RMSE), and the coverage probability (CP) are given in Table 3, Table 4 and Table 5, respectively. For example, the bias for the estimated parameter μ ^ is defined by Equation (14) [36]:
bias ( μ ^ ) = E [ μ ^ ] μ
where, if the bias gets closer to zero, the parameter estimation is better. The RMSE for the estimated parameter μ ^ is defined by Equation (15) [36,37]:
RMSE μ ^ = i n r e p ( μ μ ^ i ) 2 n r e p
where n r e p is the number of replications, and, if the RMSE gets closer to zero, the parameter estimation is better. The bias and RMSE analogue for other parameters. The coverage probability (CP) is the proportion (in percentage) of the time that the interval contains the true value of interest [38]. When the CP gets closer to 100, the parameter estimation is better.
Some characteristics of the distributions, and the relationship between FSSN and the normal, double-exponential (DE), and Student-t distributions, express the discussion about the HPD interval, bias, RMSE, and CP in the 24 scenarios above. Mathematically, the normal distribution is a special case for the FSSN distribution. When the FSSN distribution has δ = 1 , it becomes a normal distribution [24]. The double-exponential (DE) distribution is not a special case of the FSSN distribution. It has a symmetrical pattern with fat-tails and could be leptokurtic. Given the simulation data generated from the DE distribution, the FSSN distribution is proposed to be able to estimate and identify the data pattern. Then, the Student-t distribution has the same symmetrical pattern as the normal distribution, but it only has one parameter, namely, degrees of freedom ν . The smaller the ν , the fatter the tails. Its mean is zero with variance is equal to ν ν 2 . With this information, the FSSN is expected to be able to estimate the data generated from the Student-t distribution with μ equal to zero, σ equal to the ν ν 2 , and δ = 1 [39].
From Table 2, we can infer that the FSSN distribution is able to estimate the simulation data generated from the normal, DE, and Student-t distributions. This is because the HPD interval for all of the estimated parameter δ ^ is close to 1. The FSSN distribution succeeds in identifying that each random data point comes from a symmetrical distribution. From Table 3, it can be seen that the biases for the estimated parameters of the FSSN distribution are close to zero. It demonstrates that the FSSN goodness of fit to these generated patterns of data is precisely estimated. Considering Table 4, it can be seen that RMSE for the estimated parameters of the FSSN distribution is smaller than the normal distribution. For scenarios of the generated data from the DE and Student-t distributions, it can be seen that the RMSE for the estimated parameters of the FSSN distribution is close to zero. Considering Table 5 for a normal scenario, the CP for the estimated parameters of the FSSN distribution are more than or equal to the CP for the estimated parameters of the normal distribution. For the DE and Student-t (T) scenarios, some CPs for the estimated parameters of the FSSN distribution are smaller than 80%. This does not mean that the FSSN distribution has poor performance, but rather that the FSSN distribution is adaptively capable of detecting data patterns resulting from the DE and Student-t distributions with the FSSN distribution parameters themselves. This supports the difference in the characteristics of the mathematical representation between DE and Student-t on the FSSN distribution. With the explanation that has been given, two conclusions can be drawn: Firstly, that the FSSN distribution is better than the normal distribution in estimating the data generated from the normal distribution. Secondly, the FSSN distribution is able to estimate the data generated from the DE and Student-t distributions.

5.2. Simulation for Multivariate Distribution

In this subsection, we describe how we simulated for the multivariate extension of the FSSN distribution, as written in Equation (9). In this simulation, we had eight scenarios that involved multivariate normal distribution, keeping in mind that mathematically, the normal distribution is a special case of the FSSN distribution. Each scenario was designed using sample sizes of 50, 100, 150, and 200, generated from trivariate normal distribution N 3 ( μ , Σ ) , whereas the number of chains, iterations, and replications were the same as the simulation in the univariate subsection. The eight scenarios are displayed in Table 6.
From the eight scenarios above, we obtained the Euclidean distance for ( μ μ ^ ) and the determinant of the Euclidean distance matrix of ( Σ Σ ^ ) . Meanwhile, when using the trivariate FSSN distribution, besides obtaining these two, we also obtained the Euclidean distance for ( δ δ ^ ) . The formula for getting the Euclidean distance and Euclidean distance matrix can be seen in research conducted by Dokmanic et al. [40] and Lele [41], respectively. In mathematics, the Euclidean distance between two points is the length of a line segment between the two points [40]. Then, the interpretation of the determinant matrix is closely related to volume [42]. Thus, the smaller the Euclidean distance and the determinant of the Euclidean distance matrix, the better the parameter estimation. In Table 7, the Euclidean distance for the difference between the estimated parameter μ ^ and the parameter μ , the determinant of the Euclidean distance matrix for the difference between the estimated parameter Σ ^ and the parameter Σ , and the Euclidean distance for the difference between the estimated parameter δ ^ and the parameter δ , are given.
In Table 7, in the “Euclidean Distance of ( μ μ ^ ) ” column, there are two Euclidean distances for the FSSN distribution that are greater than the normal distribution, namely, in the MVN1-100 and MVN1-200 scenarios. In the “Determinant of Euclidean Distance Matrix of ( Σ Σ ^ ) ” column, very small values were obtained for both the normal and FSSN distributions. In the “Euclidean Distance of ( δ δ ^ ) ” column, the values obtained were close to zero for all scenarios. On the basis of the results obtained in Table 7, we found that the multivariate FSSN distribution is able to estimate the data generated from the multivariate normal distribution.

6. Application

This session discusses the application of the FSSN CAR model compared to the normal CAR model, using the Scotland lip cancer dataset and lung cancer dataset from the London Health Authority. The steps of comparing the normal CAR and FSSN CAR models as applied to these data are shown in STEPS A:
STEPS A. The steps of comparing the normal CAR and FSSN CAR modeling
  • Define Model 1—the regression model without spatial effects—and estimate its parameters.
  • See the error pattern that has been calculated in Model 1.
  • Compare the plot of the estimated parameter results of the normal and FSSN distributions against the error in Model 1.
  • Define Model 2—the regression model with normal CAR spatial effects —and estimate its regression parameters.
  • Define Model 3—the regression model with FSSN CAR spatial effects—and estimate its regression parameters.
  • Compare the estimated plot of the posterior parameters for the three models: Model 1, Model 2, and Model 3.
  • Compare histograms for the error of the three models.
  • Calculate the widely applicable information criterion (WAIC)—which can be seen in Watanabe [43]—and the leave-one-out (LOO) cross-validation for all three models. The model with the smallest WAIC and LOO values is the best. A deeper explanation of WAIC and LOO can be seen in Vehtari et al. [44].
The development of Stan with the R interface (RStan) in this study is represented as the steps for modeling given in Figure 1.

6.1. Scotland Lip Cancer Dataset

This Scotland lip cancer dataset first discussed in Clayton and Kaldor [45] and Morris et al. [32] is also available in the GeoBUGS example in the OpenBUGS software (https://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-the-bugs-book/bugs-book-examples/). The data include observed and expected cases (expected numbers based on the population and age and sex distribution in a county), a covariate measuring the percentage of the population engaged in agriculture, fishing, or forestry, and the "position’’ of each county expressed as a list of adjacent counties. For the lip cancer example, the model may be written as follows:
O i Poisson ( μ i )
where log   μ i = log   E i + β 0 + β 1 x i 10 + ϕ i , where ϕ i is a spatial random effect.
To make our research applicable, we included a complete syntax for modeling the lip cancer dataset using the FSSN CAR model in the R interface (RStan) according to the steps in Figure 1. However, we divided this syntax into four partitions to make it easier, namely, Listing 3, Listing 4, Listing 5, and Listing 6. We created this syntax based on the BUGS syntax on the website that we listed above. There were two steps to doing so: Firstly, to install the Stan program on the R Interface (RStan), as can be seen on the following website: https://github.com/stan-dev/rstan/wiki/Installing-RStan-from-source-on-Windows. Secondly, we had to install the R packages, i.e., “rstan”, “ggplot2”, “StanHeaders”, “coda”, “dplyr”, “ggmcmc”, “shinystan”, “shiny”, and “loo”, in order to run this syntax. The following is an explanation for Listing 3: “O”, “E”, and “x” are filled with non-spatial research data, "num" is filled with the number of neighbors in each area, “adj” is filled with a list of neighbors from each area, “weights” is filled with the number 1, as many as the number of neighbors according to the fields in “adj”. Listing 4 provides information about the “mungeCARdata4stan” function, which is used to convert the spatial data in Listing 3 into node form. Listing 5 is the full code for the FSSN CAR model in Stan, where the FSSN distribution and FSSN CAR are provided in Listing 1 and Listing 2, respectively. Listing 6 is a step for running Stan’s program, displaying the summaries, saving the summaries in the directory, saving the plot of the posterior summaries, and finally, calculating the WAIC and LOO values.
Listing 3. Writing research data on R as written in the Bayesian inference using Gibbs sampling (BUGS) program.
 #-----Research Data Including Spatial Data-----
 data = list(S = 56,
          O = c(9, 39, 11, 9, 15, 8, 26, …),
          E = c( 1.4, 8.7, 3.0, 2.5, 4.3, 2.4, 8.1, …),
          x = c(16,16,10,24,10,24,10, …),
          num = c(4, 2, …),
          adj = c( 5,9,11,19,
                7,10,
                …),
          weights = c( 1,1,1,1,
                  1,1,
                    …)
          )
Listing 4. The “mungeCARdata4stan” function is used to convert spatial data into nodes form.
  #-----Function for Rewriting Spatial Data into Node Form-----
  mungeCARdata4stan =function(adjBUGS, numBUGS){
   S = length (numBUGS);
   ss = numBUGS;
   S_edges = length (adjBUGS)/ 2;
   node1 = vector (mode="numeric", length=S_edges);
   node2 = vector (mode="numeric", length=S_edges);
   iAdj = 0;
   iEdge = 0;
   for(i in 1:S  ){
    for(j in 1:ss[i]){
     iAdj = iAdj + 1;
     if(i < adjBUGS[iAdj]){
      iEdge = iEdge + 1;
      node1[iEdge]= i;
      node2[iEdge]= adjBUGS[iAdj];
     }
    }
  }
  return (list("S"=S,"S_edges"=S_edges,"node1"=node1,"node2"=node2));
 }
  
 #-----Calling Variables Used for Spatial-----
 options(mc.cores = parallel::detectCores())
  
 nbs = mungeCARdata4stan(data$adj, data$num);
 S = data$S;              #Number of Areas
 node1 = nbs$node1;
 node2 = nbs$node2;
 S_edges = nbs$S_edges;
 O = data$O;
 x = data$x;
 E = data$E;
  
 #-----Writing a List of Data-----
 data.list = 
 list(O=O,E=E,x=x,S=S,S_edges=S_edges,node1=node1,node2=node2)
 str(data.list)
Listing 5. The full syntax FSSN CAR model in Stan.
 #-----Stan Model------
 modelString=
 functions{
   real FSSN_lpdf(real x, real mu, real sigma, real delta){
    …
   }
   real car_FSSN_lpdf(vector phi, int N, int[[] node1, int[[]node2, real delta){
   …
       }
   }
 }
 data {
  int<lower=0> S;
  int<lower=0> O[S];
  vector<lower=0> [S] E;
  vector<lower=0> [S] x;
  int<lower=0> S_edges;
  int<lower=1, upper=S> node1[S_edges]; // node1[i] adjacent to node2[i]
  int<lower=1, upper=S> node2[S_edges]; // and node1[i] < node2[i]
 }
 parameters {
  real alpha0;
  real alpha1;
  vector[S] fi;
  real<lower=0> delta;
 }
 model {
  alpha0~normal(0,1);
  alpha1~normal(0,1);
  delta~gamma(1,1);
  fi~car_FSSN(S,node1,node2,delta);
  
  for (i in 1:S)
  O[i]~poisson(exp(log(E[i])+alpha0+alpha1*x[i]/10+fi[i]));
 }
 generated quantities {
  vector[S] log_lik;
   for (i in 1:S)
   log_lik[i]       =       poisson_lpmf(O[i]       |
 exp(log(E[i])+alpha0+alpha1*x[i]/10+fi[i]));
 }
 ’
Listing 6. Running Stan program, displays the summaries along with graph plots, and calculates WAIC and LOO values. Operator “::” defines the declaration for each function in each package.
#-----Running Stan Model-----
rstan::stan_model
rstan::stan
 
stanDso = stan_model(model_code=modelString)
stanFit = stan( model_code=modelString, data=data.list, chains=4, iter=2000, thin=1, control=list(max_treedepth=20,adapt_delta=0.99))
 
#-----Display Summary-----
base::summary
 
Summaries = summary(stanFit)
Summaries
 
#-----Save Summary-----
utils::capture.output
 
ofile = "D:/Output.txt";
capture.output(print(Summaries$summary, digits=3, probs=c(0.025, 0.975)),file=ofile,options(max.print =. Machine$integer.max))
 
#-----Display the Plot Summary-----
coda::mcmc.list
coda::mcmc
ggmcmc:ggs
ggmcmc::ggmcmc
 
stan2coda =function(stanFit){
 mcmc.list(lapply(1:ncol(stanFit), function(X) mcmc(as.array(stanFit)[,X,])))}
fit.mcmc = stan2coda(stanFit)
ggso = ggs(fit.mcmc)
ggmcmc(ggso,file=(paste("D:/Output.pdf")))
 
#-----Calculating WAIC-----
loo::extract_log_lik
loo::waic
loo::loo
 
log_lik_1 = extract_log_lik(stanFit, parameter_name = "log_lik", merge_chains =TRUE)
waic(log_lik_1)
loo(log_lik_1)
Model 1Lip was set as Model 1 as the first step of “STEPS A” and applied to the Scotland lip cancer dataset. The parameter estimation results for Model 1Lip are established in Table 8. The term “semean” is the Monte Carlo standard error and “std dev” is the posterior standard deviation [46]. The term “neff” in the last two columns of Table 8 shows a crude measure of the effective sample size, and “ R ^ ” elaborates the potential scale reduction factor on split chains. At convergence, the R ^ would be equal to 1 (see Carpenter et al. [47]), meaning that Model 1Lip has reached convergence; the error patterns of both approximated normal and FSSN distributions are shown in Figure 2.
The error histogram of Model 1Lip in Figure 2 shows that there are errors whose values are far distorted on the left. This can be covered by the FSSN distribution, but not for the normal distribution. This was the first indication that the FSSN CAR model is able to accommodate the spatial effects in the Scotland lip cancer dataset.
The next step was to set Model 2Lip as Model 2, which was built by including the normal CAR spatial effect, and to set Model 3Lip as Model 3, which was built by including the FSSN CAR spatial effect. The estimation parameters of Model 2Lip and Model 3Lip are given in Table 9 and Table 10, respectively. The accuracy of the three models with respect to the original Scotland lip cancer dataset can be seen in Figure 3.
Figure 3 shows the plot of the explanatory variable versus the original response data compared to the predicted value of the three models. Model 3Lip was closer to the original data than Model 1Lip and Model 2Lip. This means that the FSSN CAR model can catch error patterns better than the normal CAR model. To support this assertion, Figure 4 demonstrates three plots of the original response data and their predicted values based on the order of observation. Throughout the range of datasets, the predictive value of Model 3Lip dominated in terms of its proximity to the original data. The comparison of the histograms of errors for all three models is given in Figure 5. The goodness of the three models, presented by the histogram in Figure 5, exhibits a very significant difference, with Model 3Lip showing the smallest error variability.
A visual comparison of the models is given in Figure 3, Figure 4 and Figure 5. Then, we evaluated each model by their WAIC and LOO values, as shown in Table 11, to compare the models.
Table 11 provides additional evidence of the goodness of fit of Model 3Lip, which was very significant compared to the other two models. Considering both the WAIC and LOO values, Model 3Lip presented the smallest values. On the basis of these facts, the FSSN CAR model statistically presented the most representative model when describing the Scotland lip cancer dataset.

6.2. Lung Cancer Dataset in a London Health Authority

For the second implementation example, we used a published lung cancer dataset as a standard example for spatial modeling. This dataset is available in GeoBUGS example on OpenBUGS software (https://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-the-bugs-book/bugs-book-examples/). The data were simulated, observed, and expected the counts of lung cancer incidence in males aged 65 and over living in the London Health Authority region; the award level index of socio-economic deprivation is also available in Thomas et al. [18].
In this case, the model can be written as Equation (16), where log   μ i = log   E i + β 0 + β 1 x i + h i + ϕ i , where ϕ i is the spatial random effects assigned as a CAR prior distribution, and the random effects are h i , for which an exchangeable normal prior distribution is assumed. The random effect for each area is thus the sum of a spatially structured component ϕ i and an unstructured component h i . This is termed a convolution prior, as can be seen in Besag et al. [48] and Mollie [49]. In the second application, the lung cancer dataset in a London Health Authority was employed. Analysis of this dataset would still use the syntax for analyzing the Scotland lip cancer dataset by changing the data input and changing the appropriate model.
Model 1Lung was set as Model 1, as previously explained in STEPS A, and applied to the lung cancer dataset. Based on STEPS A, the analysis step for the lung cancer dataset was the same step as that of the lip cancer dataset analysis. Estimation results for the regression parameters and the second random effects in Model 1Lung can be seen in Table 12. The error pattern is shown in Figure 6.
The Scotland lip cancer dataset has already helped to validate the FSSN distribute by demonstrating its ability to detect skew-left data. Figure 6 shows that the Model 1Lung error had a skew-right shape. This fact provides an opportunity for the FSSN distribution to demonstrate its ability to detect the skewness of the error in the opposite direction and also challenges the normal distribution to demonstrate its ability to explain this error in a data-driven manner. The FSSN distribution was able to correctly estimate a skewness parameter of more than one; i.e., δ = 1.1681.
Next, the comparison of the two models with the normal CAR spatial effect (Model 2Lung) and with the FSSN CAR spatial effect (Model 3Lung) was obtained based on the estimated models in Table 13 and Table 14, respectively. The predicted model of the estimation results for the three models is plotted in Figure 7. Three plots of the original response data and their predicted values based on the order of observation can be seen in Figure 8. It can be seen that the prediction of Model 3Lung was the closest to the original data. In line with the evidence in Figure 8, Figure 9 reports the pattern of histogram errors of the three models, which also establishes that Model 3Lung was the best model with the narrowest range of error variability.
The qualitative, descriptive, and visual presentation of the exploratory error needs to be continued by comparing the statistical criteria for model selection using WAIC and LOO values. These values can be seen in Table 15. Based on this table, Model 3Lung was observed to be the best model due to it having the smallest WAIC and LOO values compared to the others. Once again, this lung cancer dataset also provided evidence that the FSSN CAR model has the ability to capture the phenomenon of skewed data.

7. Conclusions

An FSSN CAR model for analyzing spatial data is proposed in this paper. This approach was developed on the basis of the normal CAR model, which was given a skewness parameter to capture spatial data that has an asymmetrical pattern. The FSSN CAR model has demonstrated its capability to detect symmetric and asymmetric data patterns. Moreover, this model allows for the use of light- or heavy-tailed data. In real life, data that are truly symmetrical are rarely found. Thus, the FSSN distribution has a wide opportunity to be selected for analyzing data that has an almost symmetrical pattern. With its flexibility, the FSSN CAR model is more representative for modeling spatial random effects when compared to the normal CAR model.
This paper provides a simulation of data with 24 scenarios. The first to eighth scenarios used simulated symmetrical data patterns which were normally distributed with different variances and sample sizes. Meanwhile, the ninth to sixteenth scenarios used simulated symmetrical data which were plausibly leptokurtic, such as having a double-exponential distribution with different dispersions and sample sizes. Then, the seventeenth to the twenty-fourth scenarios used simulated symmetrical data patterns which were Student-t distributed with different degrees of freedom. On the basis of the analysis of the 24 scenarios, the FSSN distribution exhibited its capability to detect the 24 scenarios perfectly. These 24 scenarios were simulation studies carried out with 500 replications, then, the estimation results for each replication formed a 95% HPD interval for each parameter. The HPD interval for the skewness parameter in the FFSN distribution for the 24 scenarios was close to 1, indicating that the generated data was symmetrical. This was consistent with the data patterns of the generated distribution, namely, the normal, double-exponential, and Student-t distributions. Thus, through this estimated skewness parameter, it can be concluded that the FSSN can estimate the data generated from a normal, double-exponential, or Student-t distribution. Moreover, this HPD interval showed its capability to cover the targeted parameter values for all scenarios. At each replication, we recorded the posterior parameter, so that across replications, we got posterior values for each parameter according to the number of replications. From these posterior values, we obtained the bias, RMSE, and CP values. The bias values were close to zero for the estimated parameters of the 24 scenarios, especially for the FSSN goodness-of-fit distribution for the generated normal, double-exponential, and Student-t distribution data. For the 24 scenarios, the RMSE for the estimated parameters of the FSSN distribution were close to zero. The CP of the estimated parameters of the FSSN distribution were more than or equal to the normal distribution. On the basis of the HPD, biases, RMSE, and CP for the estimated parameters of the FSSN distribution, we can finally draw the conclusion that the FSSN distribution is able to estimate and capture the characteristics of data which are normally, double-exponentially, or Student-t distributed.
In addition, we presented 8 scenarios for the multivariate case. To measure the goodness of the estimation results in each scenario, the Euclidean distance was used. On the basis of the simulation results, the Euclidean distance in the multivariate FSSN distribution was smaller than in the multivariate normal distribution. The results obtained in the univariate simulation with 24 scenarios and the multivariate simulation with 8 scenarios show that the FSSN distribution is able to estimate the generated data according to these scenarios. This fact is what we used to model the spatial effect with the normal CAR model and the FSSN CAR as an alternative model.
The application of the FSSN CAR model to Scotland lip cancer dataset and the lung cancer dataset from the London Health Authority was also carried out. In this study, the FSSN CAR model challenged the normal CAR model. To compare the normal CAR and FSSN CAR models for these datasets, we used a visual comparison, namely, a plot for the original data against the estimated models. Visually, it was found that the FSSN CAR model was closer to the original data when compared to the normal CAR model. Then, on the basis of the WAIC and LOO values, the Poisson regression model with the FSSN CAR model was also found to be better than the normal CAR model. Both these test data showed the ability of the FSSN CAR model to explain left- and right-skew patterns. In contrast, the normal CAR model was only able to accommodate symmetry patterns and short tails.
We believe that, when data have a spatial effect, the use of the FSSN CAR model should be recommended over the normal CAR model. This is because of its ability to capture various data patterns covering the weakness of the normal CAR model for use with symmetrical data with short tails only. However, for the parameter estimation, normal CAR and FSSN CAR models give almost the same results.

Author Contributions

D.R., N.I. and I. designed the research; D.R. collected and analyzed the data and drafted the paper. All authors have critically read and revised the draft and approved the final paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Ministry of Research, Technology, and Higher Education Indonesia, which gave the scholarship in Program Magister Menuju Doktor Untuk Sarjana Unggul (PMDSU).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available in a publicly accessible repository.

Acknowledgments

The authors thank the referees for their helpful comments.

Conflicts of Interest

None to declare.

Appendix A

ϕ N n ( 0 , [ τ ( D W ) ] 1 ) , we set τ = 1 , so the random variable ϕ = ( ϕ 1 , , ϕ n ) T follows the multivariate normal distribution ϕ N n ( 0 , [ D W ] 1 ) with the probability density function (p.d.f):
p Normal ( ϕ | 0 , [ D W ] 1 ) = ( 2 π ) n 2 | [ D W ] 1 | 1 2 exp ( 1 2 ϕ T [ D W ] ϕ ) .
( 2 π ) n 2 and | [ D W ] 1 | 1 2 are constant so that p.d.f can be rewritten as
p ( ϕ ) exp ( 1 2 ϕ T [ D W ] ϕ ) .
Stan computes on the log scale, so the log probability density is
log ( p ( ϕ ) ) = 1 2 ϕ T [ D W ] ϕ   = 1 2 ( ϕ 1 ϕ 2 ϕ n ) [ ( d 11 d 12 d 1 n d 21 d 22 d 21 d n 1 d n 2 d n n ) ( w 11 w 12 w 1 n w 21 w 22 w 21 w n 1 w n 2 w n n ) ] ( ϕ 1 ϕ 2 ϕ n )   = 1 2 ( ϕ 1 ϕ 2 ϕ n ) ( d 11 w 11 d 12 w 12 d 1 n w 1 n d 21 w 21 d 22 w 22 d 2 n w 2 n d n 1 w n 1 d n 2 w n 2 d n n w n n ) ( ϕ 1 ϕ 2 ϕ n )   = 1 2 ( ϕ 1 ( d 11 w 11 ) + ϕ 2 ( d 21 w 21 ) + + ϕ n ( d n 1 w n 1 ) ϕ 1 ( d 12 w 12 ) + ϕ 2 ( d 22 w 22 ) + + ϕ n ( d n 2 w n 2 ) ϕ 1 ( d 1 n w 1 n ) + ϕ 2 ( d 2 n w 2 n ) + + ϕ n ( d n n w n n ) ) T ( ϕ 1 ϕ 2 ϕ n )   = 1 2 [ { ϕ 1 ( d 11 w 11 ) ϕ 1 + ϕ 2 ( d 21 w 21 ) ϕ 1 + + ϕ n ( d n 1 w n 1 ) ϕ 1 }   + { ϕ 1 ( d 12 w 12 ) ϕ 2 + ϕ 2 ( d 22 w 22 ) ϕ 2 + + ϕ n ( d n 2 w n 2 ) ϕ 2 }   + + { ϕ 1 ( d 1 n w 1 n ) ϕ n + ϕ 2 ( d 2 n w 2 n ) ϕ n + + ϕ n ( d n n w n n ) ϕ n } ]   = 1 2 [ { ( ϕ 1 ϕ 1 d 11 + ϕ 2 ϕ 1 d 21 + + ϕ n ϕ 1 d n 1 ) + ( ϕ 1 ϕ 2 d 12 + ϕ 2 ϕ 2 d 22 + + ϕ n ϕ 2 d n 2 )   + + ( ϕ 1 ϕ n d 1 n + ϕ 2 ϕ n d 2 n + + ϕ n ϕ n d n n ) } { ( ϕ 1 ϕ 1 w 11 + ϕ 2 ϕ 1 w 21 + + ϕ n ϕ 1 w n 1 )   + ( ϕ 1 ϕ 2 w 12 + ϕ 2 ϕ 2 w 22 + + ϕ n ϕ 2 w n 2 ) + + ( ϕ 1 ϕ n w 1 n + ϕ 2 ϕ n w 2 n + + ϕ n ϕ n w n n ) } ]
where d i , i is the number of neighbors in the area s i , so d i , j = 0 , and w i , j is the neighborhood between area s i and area s j , if area s i is adjacent to the area s j , so that w i , j = 1 . Thus, the previous equation can be rewritten as:
log ( p ( ϕ ) ) = 1 2 [ { ( ϕ 1 ϕ 1 d 11 + ϕ 2 ϕ 2 d 22 + + ϕ n ϕ n d n n ) } { ( ϕ 2 ϕ 1 w 21 + + ϕ n ϕ 1 w n 1 )   + ( ϕ 1 ϕ 2 w 12 + ϕ 3 ϕ 2 w 32 + + ϕ n ϕ 2 w n 2 ) + + ( ϕ 1 ϕ n w 1 n + ϕ 2 ϕ n w 2 n + + ϕ n 1 ϕ n w n 1 , n ) } ]   = 1 2 [ { i = 1 n ϕ i ϕ i d i i } { i ~ j ϕ i ϕ j w i j + j ~ i ϕ j ϕ i w j i } ]   = 1 2 [ { i = 1 n ϕ i ϕ i d i i } { 2 i ~ j ϕ i ϕ j w i j } ]
where, as previously explained, if area s i and area s j are neighbors, then w i , j = 1 ; therefore, the value of w i , j in the previous equation is filled with 1 and can be written as
log ( p ( ϕ ) ) = 1 2 [ { i = 1 n ϕ i 2 d i i } { 2 i ~ j ϕ i ϕ j } ] .
An illustration is given to explain the term i = 1 n ϕ i 2 d i i , the neighborhood between regions s 1 ,   s 2 ,   s 3 ,   s 4 ,   and   s 5 can be seen in the following figure regions
s 1 s 2 s 5
s 3
s 4
So, we get the diagonal matrix d and the symmetry matrix W as follows
D = ( 1 0 0 0 0 0 3 0 0 0 0 0 3 0 0 0 0 0 1 0 0 0 0 0 2 ) and W = ( 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 0 0 0 1 1 0 0 )
based on the term i = 1 n ϕ i 2 d i i and matrix d, then it can be decomposed into
i = 1 n ϕ i 2 d i i = ϕ 1 2 d 11 + ϕ 2 2 d 22 + ϕ 3 2 d 33 + ϕ 4 2 d 44 + ϕ 5 2 d 55   = ϕ 1 2 × 1 + ϕ 2 2 × 3 + ϕ 3 2 × 3 + ϕ 4 2 × 1 + ϕ 5 2 × 2   = ϕ 1 2 + ϕ 2 2 + ϕ 2 2 + ϕ 2 2 + ϕ 3 2 + ϕ 3 2 + ϕ 3 2 + ϕ 4 2 + ϕ 5 2 + ϕ 5 2
and the summation in the above equation can be sorted by neighboring regions
i = 1 n ϕ i 2 d i i = ( ϕ 1 2 + ϕ 2 2 ) + ( ϕ 2 2 + ϕ 3 2 ) + ( ϕ 2 2 + ϕ 5 2 ) + ( ϕ 3 2 + ϕ 4 2 ) + ( ϕ 3 2 + ϕ 5 2 ) .
Thus, the term i = 1 n ϕ i 2 d i i can be written as i ~ j ( ϕ i 2 + ϕ j 2 ) .
Finally, the log probability density can be written as follows.
log ( p ( ϕ ) ) = 1 2 [ { i = 1 n ϕ i 2 d i i } { 2 i ~ j ϕ i ϕ j } ]   = 1 2 [ { i ~ j ( ϕ i 2 + ϕ j 2 ) } { 2 i ~ j ϕ i ϕ j } ]   = 1 2 [ i ~ j ϕ i 2 2 i ~ j ϕ i ϕ j + i ~ j ϕ j 2 ]   = 1 2 [ i ~ j ( ϕ i 2 2 ϕ i ϕ j + ϕ j 2 ) ]   = 1 2 [ i ~ j ( ϕ i ϕ j ) 2 ]

References

  1. Banerjee, S.; Wall, M.M.; Carlin, B.P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota. Biostatistics 2003, 4, 123–142. [Google Scholar] [CrossRef] [Green Version]
  2. Darmofal, D. Bayesian Spatial Survival Models for Political Event Processes. Am. J. Pol. Sci. 2009, 53, 241–257. [Google Scholar] [CrossRef] [Green Version]
  3. Rantini, D.; Candrawengi, N.L.P.I.; Iriawan, N.; Irhamah; Rusli, M. On the Computational Bayesian Survival Spatial DHF Modelling with CAR Frailty. AIP Conf. Proc. 2021, 2329, 60028. [Google Scholar] [CrossRef]
  4. Cressie, N.; Wikle, C.K. Statistics for Spatio-Temporal Data; John Wiley and Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  5. Iriawan, N.; Astutik, S.; Prastyo, D.D. Markov Chain Monte Carlo—Based Approaches for Modeling the Spatial Survival with Conditional Autoregressive (CAR) Frailty. Int. J. Comput. Sci. Netw. Secur. 2010, 10, 211–217. [Google Scholar]
  6. Aswi, A.; Cramb, S.; Duncan, E.; Hu, W.; White, G.; Mengersen, K. Bayesian Spatial Survival Models for Hospitalisation of Dengue: A Case Study of Wahidin Hospital in Makassar, Indonesia. Int. J. Environ. Res. Public Health 2020, 17, 878. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Motarjem, K.; Mohammadzadeh, M.; Abyar, A. Bayesian Analysis of Spatial Survival Model with Non-Gaussian Random Effect. J. Math. Sci. 2019, 237, 692–701. [Google Scholar] [CrossRef]
  8. Rantini, D.; Iriawan, N.; Irhamah. Bayesian Mixture Generalized Extreme Value Regression with Double-Exponential CAR Frailty for Dengue Haemorrhagic Fever in Pamekasan, East Java, Indonesia. J. Phys. Conf. Ser. 2021, 1752, 12022. [Google Scholar] [CrossRef]
  9. Rantini, D.; Abdullah, M.N.; Iriawan, N.; Irhamah; Rusli, M. On the Computational Bayesian Survival Spatial Dengue Hemorrhagic Fever (DHF) Modeling with Double-Exponential CAR Frailty. J. Phys. Conf. Ser. 2021, 1722, 012042. [Google Scholar] [CrossRef]
  10. Mbalawata, I.S.; Särkkä, S.; Haario, H. Parameter Estimation in Stochastic Differential Equations with Markov Chain Monte Carlo and Non-Linear Kalman Filtering. Comput. Stat. 2013, 28, 1195–1223. [Google Scholar] [CrossRef]
  11. Duane, S.; Kennedy, A.D.; Pendleton, B.J.; Roweth, D. Hybrid Monte Carlo. Phys. Lett. B 1987, 195, 216–222. [Google Scholar] [CrossRef]
  12. Neal, R.M. MCMC Using Hamiltonian Dynamics. In Handbook of Markov Chain Monte Carlo; Chapman and Hall: London, UK, 2011; pp. 113–162. [Google Scholar] [CrossRef] [Green Version]
  13. Chen, T.; Fox, E.; Guestrin, C. Stochastic Gradient Hamiltonian Monte Carlo. In Proceedings of the International Conference on Machine Learning, PMLR, Beijing, China, 21–26 June 2014; pp. 1683–1691. [Google Scholar]
  14. Fichtner, A.; Simutė, S. Hamiltonian Monte Carlo Inversion of Seismic Sources in Complex Media. J. Geophys. Res. Solid Earth 2018, 123, 2984–2999. [Google Scholar] [CrossRef]
  15. Girolami, M.; Calderhead, B. Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods. J. R. Stat. Soc. Ser. B Stat. Methodol. 2011, 73, 123–214. [Google Scholar] [CrossRef]
  16. Betancourt, M.; Byrne, S.; Livingstone, S.; Girolami, M. The Geometric Foundations of Hamiltonian Monte Carlo. Bernoulli 2017, 23, 2257–2298. [Google Scholar] [CrossRef]
  17. Livingstone, S.; Betancourt, M.; Byrne, S.; Girolami, M. On the Geometric Ergodicity of Hamiltonian Monte Carlo. Bernoulli 2019, 25, 3109–3138. [Google Scholar] [CrossRef] [Green Version]
  18. Thomas, A.; Best, N.; Lunn, D.; Arnold, R.; Spiegelhalter, D. GeoBugs User Manual; Cambridge Medical Research Council Biostatistics Unit: Cambridge, UK, 2004. [Google Scholar]
  19. Monnahan, C.C.; Thorson, J.T.; Branch, T.A. Faster Estimation of Bayesian Models in Ecology Using Hamiltonian Monte Carlo. Methods Ecol. Evol. 2017, 339–348. [Google Scholar] [CrossRef]
  20. Annis, J.; Miller, B.J.; Palmeri, T.J. Bayesian Inference with Stan: A Tutorial on Adding Custom Distributions. Behav. Res. Methods 2017, 49, 863–886. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Wetzels, R.; Lee, M.D.; Wagenmakers, E.J. Bayesian Inference Using WBDev: A Tutorial for Social Scientists. Behav. Res. Methods 2010, 42, 884–897. [Google Scholar] [CrossRef]
  22. Wabersich, D.; Vandekerckhove, J. Extending JAGS: A Tutorial on Adding Custom Distributions to JAGS (with a Diffusion Model Example). Behav. Res. Methods 2014, 46, 15–28. [Google Scholar] [CrossRef] [Green Version]
  23. Ghaderinezhad, F.; Ley, C.; Loperfido, N. Bayesian Inference for Skew-Symmetric Distributions. Symmetry 2020, 12, 491. [Google Scholar] [CrossRef] [Green Version]
  24. Castillo, N.O.; Gómez, H.W.; Leiva, V.; Sanhueza, A. On the Fernández–Steel Distribution: Inference and Application. Comput. Stat. Data Anal. 2011, 55, 2951–2961. [Google Scholar] [CrossRef]
  25. Azzalini, A. The Skew-Normal Distribution and Related Multivariate Families. Scand. J. Stat. 2005, 32, 159–188. [Google Scholar] [CrossRef]
  26. Zhang, J.; Shields, M.D. On the Quantification and Efficient Propagation of Imprecise Probabilities Resulting from Small Datasets. Mech. Syst. Signal Process. 2018, 98, 465–483. [Google Scholar] [CrossRef]
  27. Beer, M.; Ferson, S.; Kreinovich, V. Imprecise Probabilities in Engineering Analyses. Mech. Syst. Signal Process. 2013, 37, 4–29. [Google Scholar] [CrossRef] [Green Version]
  28. Torre, E.; Marelli, S.; Embrechts, P.; Sudret, B. A General Framework for Data-Driven Uncertainty Quantification under Complex Input Dependencies Using Vine Copulas. Probabilistic Eng. Mech. 2019, 55, 1–16. [Google Scholar] [CrossRef] [Green Version]
  29. Zhang, J.; Shields, M.D. Efficient Monte Carlo Resampling for Probability Measure Changes from Bayesian Updating. Probabilistic Eng. Mech. 2019, 55, 54–66. [Google Scholar] [CrossRef]
  30. Zhang, J.; Shields, M. On the Quantification and Efficient Propagation of Imprecise Probabilities with Copula Dependence. Int. J. Approx. Reason. 2020, 122, 24–46. [Google Scholar] [CrossRef]
  31. Fernández, C.; Steel, M.F.J. On Bayesian Modeling of Fat Tails and Skewness. J. Am. Stat. Assoc. 1998, 93, 359–371. [Google Scholar] [CrossRef] [Green Version]
  32. Morris, M.; Wheeler-Martin, K.; Simpson, D.; Mooney, S.J.; Gelman, A.; DiMaggio, C. Bayesian Hierarchical Spatial Models: Implementing the Besag York Mollié Model in Stan. Spat. Spatiotempor. Epidemiol. 2019, 31, 1–18. [Google Scholar] [CrossRef] [PubMed]
  33. Besag, J. Spatial Interaction and the Statistical Analysis of Lattice Systems. J. R. Stat. Soc. Ser. B 1974, 36, 192–225. [Google Scholar] [CrossRef]
  34. Banerjee, S.; Carlin, B.P.; Gelfand, A.E. Hierarchical Modeling and Analysis for Spatial Data; Chapman and Hall: London, UK, 2014. [Google Scholar]
  35. Ferreira, J.T.A.S.; Steel, M.F.J. A New Class of Skewed Multivariate Distributions with Applications to Regression Analysis. Stat. Sin. 2007, 17, 505–529. [Google Scholar]
  36. Walther, B.A.; Moore, J.L. The Concepts of Bias, Precision and Accuracy, and Their Use in Testing the Performance of Species Richness Estimators, with a Literature Review of Estimator Performance. Ecography 2005, 28, 815–829. [Google Scholar] [CrossRef]
  37. Andronescu, M.; Condon, A.; Hoos, H.H.; Mathews, D.H.; Murphy, K.P. Computational Approaches for RNA Energy Parameter Estimation. RNA 2010, 16, 2304–2318. [Google Scholar] [CrossRef] [Green Version]
  38. Zhao, H.; Tian, L. On Estimating Medical Cost and Incremental Cost-Effectiveness Ratios with Censored Data. Biometrics 2001, 57, 1002–1008. [Google Scholar] [CrossRef]
  39. Hitchcock, S.; Hogg, R.V.; Craig, A.T. Introduction to Mathematical Statistics; Pearson Education: London, UK, 1966; Volume 129. [Google Scholar] [CrossRef]
  40. Dokmanic, I.; Parhizkar, R.; Ranieri, J.; Vetterli, M. Euclidean Distance Matrices: Essential Theory, Algorithms, and Applications. IEEE Signal Process. Mag. 2015, 32, 12–30. [Google Scholar] [CrossRef] [Green Version]
  41. Lele, S. Euclidean Distance Matrix Analysis (EDMA): Estimation of Mean Form and Mean Form Difference. Math. Geol. 1993, 25, 573–602. [Google Scholar] [CrossRef]
  42. Lax, P.D. Linear Algebra and Its Applications, 2nd ed.; John Wiley and Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  43. Watanabe, S. Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. J. Mach. Learn. Res. 2010, 11, 3571–3594. [Google Scholar]
  44. Vehtari, A.; Gelman, A.; Gabry, J. Practical Bayesian Model Evaluation Using Leave-One-out Cross-Validation and WAIC. Stat. Comput. 2017, 27, 1413–1432. [Google Scholar] [CrossRef] [Green Version]
  45. Clayton, D.; Kaldor, J. Empirical Bayes Estimates of Age-Standardized Relative Risks for Use in Disease Mapping. Biometrics 1987, 43, 671–681. [Google Scholar] [CrossRef] [PubMed]
  46. Gelman, A.; Lee, D.; Guo, J. Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization. J. Educ. Behav. Stat. 2015, 20, 1–14. [Google Scholar] [CrossRef] [Green Version]
  47. Carpenter, B.; Gelman, A.; Hoffman, M.D.; Lee, D.; Goodrich, B.; Betancourt, M.; Brubaker, M.A.; Guo, J.; Li, P.; Riddell, A. Stan: A Probabilistic Programming Language. J. Stat. Softw. 2017, 76, 1–32. [Google Scholar] [CrossRef] [Green Version]
  48. Besag, J.; York, J.; Mollié, A. Bayesian Image Restoration, with Two Applications in Spatial Statistics. Ann. Inst. Stat. Math. 1991, 43, 1–20. [Google Scholar] [CrossRef]
  49. Mollié, A. Bayesian Mapping of Disease. Markov Chain Mt. Carlo Pract. 1996, 1, 359–379. [Google Scholar]
Figure 1. The steps for modeling using Stan in the R interface (RStan).
Figure 1. The steps for modeling using Stan in the R interface (RStan).
Symmetry 13 00545 g001
Figure 2. Plots of the error of Model 1Lip with approximated normal and Fernandez–Steel skew normal (FSSN) distributions.
Figure 2. Plots of the error of Model 1Lip with approximated normal and Fernandez–Steel skew normal (FSSN) distributions.
Symmetry 13 00545 g002
Figure 3. Plots of the original data with the estimated model for (a) Model 1Lip, (b) Models 2Lip, and (c) Model 3Lip.
Figure 3. Plots of the original data with the estimated model for (a) Model 1Lip, (b) Models 2Lip, and (c) Model 3Lip.
Symmetry 13 00545 g003
Figure 4. Plots of the original response data and predicted value for Model 1Lip, Models 2Lip, and Model 3Lip based on the observation order.
Figure 4. Plots of the original response data and predicted value for Model 1Lip, Models 2Lip, and Model 3Lip based on the observation order.
Symmetry 13 00545 g004
Figure 5. Error histogram plots of Model 1Lip, Model 2Lip, and Model 3Lip.
Figure 5. Error histogram plots of Model 1Lip, Model 2Lip, and Model 3Lip.
Symmetry 13 00545 g005
Figure 6. Plots of the Model 1Lung error histogram with approximated normal and FSSN distributions.
Figure 6. Plots of the Model 1Lung error histogram with approximated normal and FSSN distributions.
Symmetry 13 00545 g006
Figure 7. Plots of the original data with the estimated model: (a) Model 1Lung, (b) Model 2 Lung, and (c) Model 3 Lung.
Figure 7. Plots of the original data with the estimated model: (a) Model 1Lung, (b) Model 2 Lung, and (c) Model 3 Lung.
Symmetry 13 00545 g007
Figure 8. Plots of the original response data and predicted value of Model 1Lung, Model 2 Lung, and Model 3 Lung based on the observation order.
Figure 8. Plots of the original response data and predicted value of Model 1Lung, Model 2 Lung, and Model 3 Lung based on the observation order.
Symmetry 13 00545 g008
Figure 9. Error histogram plots of Model 1Lung, Model 2 Lung, and Model 3 Lung.
Figure 9. Error histogram plots of Model 1Lung, Model 2 Lung, and Model 3 Lung.
Symmetry 13 00545 g009
Table 1. Twenty-four scenario simulations for univariate normal (N), double-exponential (DE), and Student-t (T) distributions.
Table 1. Twenty-four scenario simulations for univariate normal (N), double-exponential (DE), and Student-t (T) distributions.
DistributionScenarioSample Size
Normal (0,2)N1-125125
N1-250250
N1-500500
N1-10001000
Normal (0,10)N2-125125
N2-250250
N2-500500
N2-10001000
DE (0,1)DE1-125125
DE1-250250
DE1-500500
DE1-10001000
DE (0,4)DE2-125125
DE2-250250
DE2-500500
DE2-10001000
t (5)T1-125125
T1-250250
T1-500500
T1-10001000
t (7)T2-125125
T2-250250
T2-500500
T2-10001000
Table 2. The 95% highest posterior density (HPD) interval for the estimated parameters of 24 scenario simulations for normal (N), double-exponential (DE), and Student-t (T) distributions.
Table 2. The 95% highest posterior density (HPD) interval for the estimated parameters of 24 scenario simulations for normal (N), double-exponential (DE), and Student-t (T) distributions.
Scenario DistributionTargeted Parameter95% HPD Interval of Estimated Parameter
μ σ δ μ ^ σ ^ δ ^
LLULLLULLLUL
N1-125Normal02-−0.07650.08661.91002.0847--
FSSN021−0.04350.04041.90952.08420.92951.0776
N1-250Normal02-−0.08780.09101.89882.1010--
FSSN021−0.05800.05051.89752.09910.94201.0640
N1-500Normal02-−0.09710.10051.90712.0822--
FSSN021−0.05980.07011.90652.08150.95221.0504
N1-1000Normal02-−0.08750.08431.92602.0713--
FSSN021−0.07930.07221.92532.07030.96271.0417
N2-125Normal010-−0.02410.02289.971810.0355--
FSSN0101−0.00920.00939.972210.03550.91821.0829
N2-250Normal010-−0.03020.02759.964310.0450--
FSSN0101−0.01260.01309.962810.04350.94121.0593
N2-500Normal010-−0.04110.04379.941810.0645--
FSSN0101−0.01770.01749.942310.06350.95371.0483
N2-1000Normal010-−0.05930.05529.930310.0701--
FSSN0101−0.02470.02509.930210.07010.96561.0345
DE1-125DE01-−0.09120.09660.90051.0875--
FSSN011−0.07400.06480.85721.10860.88441.1172
DE1-250DE01-−0.07040.07610.91831.0870--
FSSN011−0.07760.07550.88641.10270.90581.1196
DE1-500DE01-−0.05220.06200.93611.0745--
FSSN011−0.07660.07920.91661.09110.92191.0877
DE1-1000DE01-−0.04320.03990.94871.0608--
FSSN011−0.07010.07310.93061.06610.92521.0750
DE2-125DE04-−0.06910.06803.94994.0466--
FSSN041−0.02820.02543.90084.08460.88661.1167
DE2-250DE04-−0.07750.08383.93634.0647--
FSSN041−0.03630.03603.87204.11960.91731.1045
DE2-500DE04-−0.08870.10133.92484.0820--
FSSN041−0.04980.05173.85544.14970.93761.0717
DE2-1000DE04-−0.10140.09273.90934.0986--
FSSN041−0.06390.06783.83404.15060.94211.0520
T1-125t---------
FSSN01.29101−0.06680.07361.13811.42700.88341.1129
T1-250t---------
FSSN01.29101−0.08530.09651.15771.44480.89041.1189
T1-500t---------
FSSN01.29101−0.10930.12941.17731.42680.90201.1053
T1-1000t---------
FSSN01.29101−0.12420.12991.20211.38810.90681.1023
T2-125t---------
FSSN01.18321−0.06380.06481.05981.30550.89541.1035
T2-250t---------
FSSN01.18321−0.08480.07781.07181.28410.90951.0980
T2-500t---------
FSSN01.18321−0.10020.10371.10271.28080.92141.0938
T2-1000t---------
FSSN01.18321−0.10820.09461.10861.24880.93391.0763
LL and UL are the lower limit and upper limit of the HPD interval, respectively.
Table 3. Bias for the estimated parameters of 24 scenario simulations for normal (N), double-exponential (DE), and Student-t (T) distributions.
Table 3. Bias for the estimated parameters of 24 scenario simulations for normal (N), double-exponential (DE), and Student-t (T) distributions.
Scenario DistributionTargeted ParameterBias of the Estimated Parameter
μ σ δ μ ^ σ ^ δ ^
N1-125Normal02-0.001330.00112-
FSSN0210.00054−0.000380.00295
N1-250Normal02-−0.00331−0.00049-
FSSN021−0.00251−0.001900.00066
N1-500Normal02-0.00273−0.00152-
FSSN0210.00111−0.002670.00180
N1-1000Normal02-0.001050.00155-
FSSN021−0.000330.000640.00120
N2-125Normal010-−0.000050.00200-
FSSN0101−0.000300.000700.00300
N2-250Normal010-−0.000500.00080-
FSSN0101−0.000100.000200.00020
N2-500Normal010-−0.00200−0.00030-
FSSN0101−0.00060−0.00090−0.00200
N2-1000Normal010-−0.00200−0.00100-
FSSN0101−0.00100−0.002000.00004
DE1-125DE01-0.002010.00170-
FSSN0110.00172−0.007210.00198
DE1-250DE01-0.001040.00130-
FSSN011−0.00126−0.004630.00449
DE1-500DE01-0.002270.00290-
FSSN0110.00061−0.001950.00192
DE1-1000DE01-−0.001100.00398-
FSSN011−0.000680.000450.00039
DE2-125DE04-0.00150−0.00040-
FSSN0410.00052−0.005900.00250
DE2-250DE04-0.00130−0.00120-
FSSN041−0.00097−0.006400.00380
DE2-500DE04-0.003700.00079-
FSSN0410.00008−0.004500.00170
DE2-1000DE04-−0.002500.00450-
FSSN041−0.000770.00018−0.00028
T1-125t------
FSSN01.29101−0.00126−0.009800.00309
T1-250t------
FSSN01.29101−0.00040−0.004230.00225
T1-500t------
FSSN01.291010.00026−0.007600.00133
T1-1000t------
FSSN01.29101−0.00037−0.002950.00049
T2-125t------
FSSN01.183210.00242−0.007420.00004
T2-250t------
FSSN01.18321−0.00038−0.003320.00191
T2-500t------
FSSN01.18321−0.00303−0.000580.00399
T2-1000t------
FSSN01.18321−0.00159−0.002620.00258
Table 4. Root-mean-squared error (RMSE) for the estimated parameters of 24 scenario simulations for normal (N), double-exponential (DE), and Student-t (T) distributions.
Table 4. Root-mean-squared error (RMSE) for the estimated parameters of 24 scenario simulations for normal (N), double-exponential (DE), and Student-t (T) distributions.
ScenarioDistributionTargeted ParameterRMSE of the Estimated Parameter
μ σ δ μ ^ σ ^ δ ^
N1-125Normal02-0.04240.0454-
FSSN0210.02150.04570.0385
N1-250Normal02-0.04780.0517-
FSSN0210.02790.05180.0313
N1-500Normal02-0.05060.0457-
FSSN0210.03460.04600.0249
N1-1000Normal02-0.04370.0385-
FSSN0210.03980.03850.0197
N2-125Normal010-0.01190.0163-
FSSN01010.00500.01610.0436
N2-250Normal010-0.01550.0206-
FSSN01010.00640.02060.0315
N2-500Normal010-0.02160.0306-
FSSN01010.00870.03060.0241
N2-1000Normal010-0.02950.0355-
FSSN01010.01280.03550.0178
DE1-125DE01-0.04810.0470-
FSSN0110.03540.06490.0579
DE1-250DE01-0.03680.0432-
FSSN0110.03970.05600.0532
DE1-500DE01-0.02880.0376-
FSSN0110.04220.04540.0443
DE1-1000DE01-0.02120.0302-
FSSN0110.03870.03520.0377
DE2-125DE04-0.03550.0245-
FSSN0410.01370.04840.0561
DE2-250DE04-0.04150.0327-
FSSN0410.01910.06380.0466
DE2-500DE04-0.04800.0428-
FSSN0410.02690.07670.0345
DE2-1000DE04-0.04920.0508-
FSSN0410.03570.08210.0279
T1-125t------
FSSN01.291010.03550.07690.0593
T1-250t------
FSSN01.291010.04680.07470.0572
T1-500t------
FSSN01.291010.05850.06330.0515
T1-1000t------
FSSN01.291010.06780.04890.0492
T2-125t------
FSSN01.183210.03250.06090.0531
T2-250t------
FSSN01.183210.04400.05750.0489
T2-500t------
FSSN01.183210.05260.04600.0426
T2-1000t------
FSSN01.183210.05200.03590.0361
Table 5. Coverage probability (CP) for the estimated parameters of 24 scenario simulations for normal (N), double-exponential (DE), and Student-t (T) distributions.
Table 5. Coverage probability (CP) for the estimated parameters of 24 scenario simulations for normal (N), double-exponential (DE), and Student-t (T) distributions.
Scenario DistributionTargeted ParameterCP of the Estimated Parameter (%)
μ σ δ μ ^ σ ^ δ ^
N1-125Normal02-100100-
FSSN02110010099
N1-250Normal02-10099-
FSSN0211009999
N1-500Normal02-9998-
FSSN0211009899
N1-1000Normal02-9997-
FSSN02110097100
N2-125Normal010-100100-
FSSN010110010096
N2-250Normal010-100100-
FSSN010110010098
N2-500Normal010-100100-
FSSN010110010097
N2-1000Normal010-100100-
FSSN010110010097
DE1-125DE01-9799-
FSSN0111009193
DE1-250DE01-9799-
FSSN0111008591
DE1-500DE01-9697-
FSSN011998191
DE1-1000DE01-9696-
FSSN011997686
DE2-125DE04-100100-
FSSN04110010089
DE2-250DE04-100100-
FSSN0411009987
DE2-500DE04-100100-
FSSN0411009688
DE2-1000DE04-99100-
FSSN0411008986
T1-125t------
FSSN01.291011009092
T1-250t------
FSSN01.291011008387
T1-500t------
FSSN01.29101977682
T1-1000t------
FSSN01.29101927776
T2-125t------
FSSN01.183211009395
T2-250t------
FSSN01.183211009193
T2-500t------
FSSN01.18321998692
T2-1000t------
FSSN01.18321968590
Table 6. Eight scenario simulations for multivariate normal (MVN) distribution.
Table 6. Eight scenario simulations for multivariate normal (MVN) distribution.
DistributionScenarioSample Size
N 3 ( μ 1 = ( 0 0 0 ) ,   Σ 1 = ( 1 1 1 1 2 1 1 1 3 ) ) MVN1-5050
MVN1-100100
MVN1-150150
MVN1-200200
N 3 ( μ 2 = ( 0 0 0 ) ,   Σ 2 = ( 4 1 1 1 5 1 1 1 6 ) ) MVN2-5050
MVN2-100100
MVN2-150150
MVN2-200200
Table 7. Euclidean distance and determinant of Euclidean distance matrix for the estimated parameters of eight scenario simulations for multivariate normal (MVN) distribution.
Table 7. Euclidean distance and determinant of Euclidean distance matrix for the estimated parameters of eight scenario simulations for multivariate normal (MVN) distribution.
ScenarioMultivariate DistributionTargeted Parameter Euclidean   Distance   of   ( μ μ ^ ) Determinant   of   Euclidean   Distance   Matrix   of   ( Σ Σ ^ ) Euclidean   Distance   of   ( δ δ ^ )
μ Σ δ
MVN1-50Normal ( 0 0 0 ) ( 1 1 1 1 2 1 1 1 3 ) ( 1 1 1 ) 0.00566 4.3863 × 10 38 -
FSSN0.00247 1.2720 × 10 39 0.00002
MVN1-100Normal0.00019 7.1217 × 10 42 -
FSSN0.00196 1.5471 × 10 40 0.00606
MVN1-150Normal0.00301 4.5139 × 10 38 -
FSSN0.00157 5.2645 × 10 39 0.00515
MVN1-200Normal0.00538 3.1958 × 10 39 -
FSSN0.00302 3.1340 × 10 41 0.00123
MVN2-50Normal ( 0 0 0 ) ( 4 1 1 1 5 1 1 1 6 ) ( 1 1 1 ) 0.00753 1.9865 × 10 38 -
FSSN0.00350 1.9462 × 10 37 0.00559
MVN2-100Normal0.00294 6.0291 × 10 37 -
FSSN0.00315 3.4961 × 10 37 0.00431
MVN2-150Normal0.00163 4.0158 × 10 38 -
FSSN0.00009 1.7077 × 10 38 0.00502
MVN2-200Normal0.00421 3.0528 × 10 37 -
FSSN0.00044 3.4680 × 10 38 0.00611
Table 8. Estimated Model 1Lip.
Table 8. Estimated Model 1Lip.
ParametersMeansemeanStd DevHPD Intervalneff R ^
2.5%50%97.5%
β 0 −0.54200.00120.0708−0.6820−0.5420−0.404032641.0000
β 1 0.73500.00110.06020.61500.73600.853032401.0000
Table 9. Estimated Model 2Lip.
Table 9. Estimated Model 2Lip.
ParametersMeansemeanStd DevHPD Intervalneff R ^
2.5%50%97.5%
β 0 −0.17320.00290.1410−0.4541−0.17070.100723201.0000
β 1 0.29730.00330.15100.00490.29680.596720411.0000
ϕ 1 1.29660.00330.31500.64511.30961.894589531.0000
ϕ 2 1.16950.00280.19500.78731.17171.551347841.0000
ϕ 56 −0.54660.00350.3810−1.2947−0.54230.180811,6061.0000
Table 10. Estimated Model 3Lip.
Table 10. Estimated Model 3Lip.
ParametersMeansemeanStd DevHPD Intervalneff R ^
2.5%50%97.5%
β 0 −0.31360.00510.1950−0.6937−0.31460.068514401.0000
β 1 0.33570.00580.21240.08380.33680.745613311.0000
ϕ 1 1.38220.00410.32090.73801.39121.996060431.0000
ϕ 2 1.24890.00460.22890.80461.24901.695924671.0000
ϕ 56 −0.83600.00780.7391−2.4176−0.78700.473790941.0000
δ 1 1.00190.00100.09960.80771.00091.196710,1141.0000
δ 2 1.00340.00100.10180.80431.00211.203210,4341.0000
δ 56 0.99820.00100.09950.80350.99821.197210,0391.0000
Table 11. The widely applicable information criterion (WAIC) and leave-one-out (LOO) values for Model 1Lip, Models 2Lip, and Model 3Lip.
Table 11. The widely applicable information criterion (WAIC) and leave-one-out (LOO) values for Model 1Lip, Models 2Lip, and Model 3Lip.
ModelWAICLOO
Model 1Lip460.9460.9
Model 2Lip290.1305.6
Model 3Lip288.0303.5
Table 12. Estimated Model 1Lung.
Table 12. Estimated Model 1Lung.
ParametersMeansemeanStd DevHPD Intervalneff R ^
2.5%50%97.5%
β 0 −0.24300.00180.1155−0.4802−0.2387−0.025241071.0000
β 1 0.05000.00030.02130.00890.04980.093749941.0000
h 1 −0.13000.00270.2509−0.6656−0.11290.335285971.0000
h 2 −0.06500.00210.2271−0.5439−0.05550.367911,5831.0000
h 44 0.04100.00210.2363−0.43930.03650.516712,1921.0000
Table 13. Estimated Model 2Lung.
Table 13. Estimated Model 2Lung.
ParametersMeansemeanStd Dev neff R ^
2.5%50%97.5%
β 0 −0.23400.00120.1397−0.5115−0.23400.037413,5701.0000
β 1 0.03400.00030.0345−0.03330.03400.101511,6391.0000
h 1 −0.01900.00090.1619−0.4045−0.00500.314134,3651.0000
h 2 −0.02800.00120.1635−0.4234−0.00900.289819,4221.0000
h 44 0.01900.00090.1618−0.31070.00500.399535,8041.0000
ϕ 1 −0.26100.00180.3451−0.9488−0.25600.398537,0611.0000
ϕ 2 0.09700.00200.3316−0.56410.10100.737427,5261.0000
ϕ 44 0.02600.00140.3024−0.58500.03220.604347,4731.0000
Table 14. Estimated Model 3Lung.
Table 14. Estimated Model 3Lung.
ParametersMeansemeanStd DevHPD Intervalneff R ^
2.5%50%97.5%
β 0 −0.34120.00920.2091−0.7600−0.34000.0619514.51.0100
β 1 0.05090.00260.0554−0.05700.05070.16013449.01.0100
h 1 −0.04120.00670.2582−0.6223−0.01380.49171505.61.0000
h 2 −0.07800.00760.2553−0.6864−0.03300.40141122.51.0100
h 44 −0.00990.01020.2934−0.6708−0.00120.61403835.21.0000
ϕ 1 −0.19370.00870.3713−0.9347−0.18700.52033820.31.0000
ϕ 2 0.11200.01240.3903−0.67330.11700.87373996.11.0100
ϕ 44 0.17500.01230.4822−0.81310.18801.10193543.41.0000
δ 1 0.99920.00140.09970.80370.99901.19715045.01.0000
δ 2 1.00040.00140.09950.80751.00001.19695083.91.0000
δ 44 1.00380.001420.09810.81271.00001.19944799.91.0000
Table 15. The WAIC and LOO values of Model 1Lung, Model 2 Lung, and Model 3 Lung.
Table 15. The WAIC and LOO values of Model 1Lung, Model 2 Lung, and Model 3 Lung.
ModelWAICLOO
Model 1Lung231.5234.3
Model 2Lung223.4230.3
Model 3Lung220.1222.7
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Rantini, D.; Iriawan, N.; Irhamah. Fernandez–Steel Skew Normal Conditional Autoregressive (FSSN CAR) Model in Stan for Spatial Data. Symmetry 2021, 13, 545. https://doi.org/10.3390/sym13040545

AMA Style

Rantini D, Iriawan N, Irhamah. Fernandez–Steel Skew Normal Conditional Autoregressive (FSSN CAR) Model in Stan for Spatial Data. Symmetry. 2021; 13(4):545. https://doi.org/10.3390/sym13040545

Chicago/Turabian Style

Rantini, Dwi, Nur Iriawan, and Irhamah. 2021. "Fernandez–Steel Skew Normal Conditional Autoregressive (FSSN CAR) Model in Stan for Spatial Data" Symmetry 13, no. 4: 545. https://doi.org/10.3390/sym13040545

APA Style

Rantini, D., Iriawan, N., & Irhamah. (2021). Fernandez–Steel Skew Normal Conditional Autoregressive (FSSN CAR) Model in Stan for Spatial Data. Symmetry, 13(4), 545. https://doi.org/10.3390/sym13040545

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop