A Linear Bayesian Updating Model for Probabilistic Spatial Classification

Categorical variables are common in spatial data analysis. Traditional analytical methods for deriving probabilities of class occurrence, such as kriging-family algorithms, have been hindered by the discrete characteristics of categorical fields. To solve the challenge, this study introduces the theoretical backgrounds of the linear Bayesian updating (LBU) model for spatial classification through an expert system. The main purpose of this paper is to present the solid theoretical foundations of the LBU approach. Since the LBU idea is originated from aggregating expert opinions and is not restricted to conditional independent assumption (CIA), it may prove to be reasonably adequate for analyzing complex geospatial data sets, such as remote sensing images or area-class maps.


Introduction
Categorical spatial data, such as lithofacies, land-use/land-cover classifications, and mineralization phases, are widely investigated geographical and geological information sources.They are typically represented by mutually exclusive and collectively exhaustive classes and visualized as area-class maps [1].In the Geo-information context, Rao's quadratic diversity was used in [2] to measure the scale-dependent landscape structure.In the geological counterpart, a spatial hidden Markov chain model was employed in [3] for estimation of petroleum reservoir categorical variables.As a geostatistical model, the Markov chain random field (MCRF) theory [4] and Markov chain sequential simulation (MCSS) algorithm [5] are common choices for the prediction of categorical spatial data.They have been widely used in spatial-related fields and gratifying results have been achieved.However, the MCRF approach is based on a conditional independent assumption (CIA), which may be inappropriate due to complex data interaction in a spatial context [6].The Tau model [7] and Nu expression [8] introduce additional weights to relax the assumption of conditional independence.It is obvious that these power or multiplication relationships between multi-point pre-posterior probabilities and two-point conditional probabilities involve some subjective guesswork and may not be suitable in real-world spatial analysis.As for the generalized linear mixed model (GLMM) [6], where intermediate, latent, spatially correlated, normal variables are assumed for the observable non-normal responses to account for spatial dependence information.The random effects are always assumed to follow a normal distribution in the GLMM.Our concern here is whether the latent variables for different categories can be assumed to be independent of each other at the same location.
Generally speaking, the spatial classification problem can be regarded as combing two-point transition probabilities into a multi-point conditional probability.A formal introduction of most of the available approaches to aggregate probability distributions in geosciences can be found in [9].Our task is to use the probability pooling method for spatial classification based on the pioneering work of [10,11].We profit from the predecessor's studies and interpret transition probabilities as expert opinions.The transition probabilities are obtained by the transiogram [12] spatial measure.
The remainder of this paper is organized as follows.We begin by introducing the basic forms of the linear Bayesian updating (LBU) method in Section 2. We then make detailed proofs in Section 3 for some propositions introduced by [10,11] which have not yet been proven.A real-world case study is given in Section 4. Finally, conclusions and future challenges are discussed in Section 5.

Linear Bayesian Updating
Consider the spatial locations x 0 , x 1 , • • • , x n in the remote sensing images or area-class maps.We use A and D 1 , • • • , D n to represent the events in sample spaces of categorical random variable C (x 0 ) and C (x 1 ) , • • • , C (x n ) respectively and A denotes the complementary event of A. In the case of categorical data, let A be the finite set of events in the sample space Ω such that the events A 1 , A 2 • • • , A K of A are mutually exclusive and collectively exhaustive.Obviously, In the subsequent discussions, A will be used as a general notation for A j .We treat the n neighboring events D 1 , D 2 , • • • , D n as experts, and consider the conditional probabilities P (A|D i ) as expert opinions Q i for the occurrence of A, an event of interest.The experts' opinions are regarded as random variables Q i whose values q i , 1 ≤ i ≤ n, are to be revealed to the decision maker (DM).The posterior probability of A given Q = q is then p * (q).
The original LBU method was firstly proposed by [10] in statistical science in the form with possibly negative weights, λ i , expressing the amounts of correlation between each Q i and A.
µ i denotes the mathematical expectation of Q i .When µ 1 = • • • = µ n = p and λ i ≥ 0, Equation (2) yields to the linear opinion pool [13] p * (q) = λ 0 p + ∑ n i=1 λ i q i subject to ∑ where the DM is considered as one of the experts.Our LBU method follows closely to that of [10], which is proved to be the only formula satisfying p * (q) dF (q) = p for all distribution dF with mean vector µ.The highlight of our LBU model lies in the fact that the random variable Q i has been replaced by transition probability, a measure for spatial continuity.

Parameter Ranges for Linear Bayesian Updating
Although the LBU model has been used in [10,11], it is our conviction that many theoretical challenges need to be solved to better develop this method for further use.A legitimate posterior probability can be obtained only when λ i obeys a number of inequalities [10].Since p * (q) is a probability, it must satisfy 0 ≤ p * (q) ≤ 1, that is to say, Through algebra transformation, (4) can be simplified as ( Suppose that all λ i are positive, since 0 ≤ q i ≤ 1, as long as (5) can be satisfied.Therefore, if the DM considers that all λ i are positive, the most common case, then they must be chosen so that which can be regarded as a sufficient but not necessary condition of the LBU method.Only when ( 6), or equivalently (7), is satisfied, can Equation ( 2) be a valid probabilistic model.

Interpreting Parameters as Regression Coefficients for Linear Bayesian Updating
The LBU model given above has some parameters, which need to be learned or estimated by the DM.Let Σ Q denote the covariance matrix of Q, and σ p * Q be the vector of covariances between p * and Q.Let t denote matrix transposition and using the definition of expectation, it is In addition, Equation ( 2) can be given as Taking the expectations on both sides of the equation after transformation yields provided that the covariance matrix Σ Q is invertible.Suppose we have m samples in the training set, consider the regression model denotes the random errors.Provided that the experts do not have a linearly dependent relationship, the least squares estimation of the regression coefficients yields which can be written in its matrix form When the sample size is large enough, the sample mean is approximately equal to the total expectation, we have Therefore, our derivation gives an explanation of the parameters λ i in the LBU model as the linear regression coefficients of p * − p with respect to Q − µ when the neighboring events are not linearly dependent.As in multiple regression, each λ i can thus be thought of as a measure of the additional information that the ith expert provides over and above the other experts and what the DM already knows.

Invertible Conditions of Linear Bayesian Updating
We now discuss what happened when the linear systems of Equation (2) become invertible.Equation ( 2) can be rewritten as Since the determinant |Λ| = 0, Λ is irreversible.Therefore, q 1 , q 2 , • • • , q n cannot be uniquely determined.Thus, the linear systems of Equation ( 2) are not invertible under these circumstances.
The case where the linear system of Equation ( 2) is invertible happens only when there is one expert to be consulted, i.e., p * (q) = p + λ (q − µ) .
In this case, the left inverse of the system is where p * is the input.The right inverse can be given as where p * (q i ) is the desired output after consultation.The necessary q i can be compared with the corresponding transition probability P (A|D i ).If large deviation emerges, the expert Q i may seem not to be convincing.

Case Study
We now present a case study to demonstrate the use of the method.The Swiss Jura data set [14] is used, where four lithology types are sampled in a 14.5 km 2 region.These rock types are Argovian, Kimmeridgian, Sequanian and Quaternary; corresponding class proportions of these four categories are 20.46%,32.82%, 24.32%, 22.39% respectively.We have 259 samples in total for prediction (Figure 1).
Challenges 2016, 7, 21 6 of 9 In this case, the left inverse of the system is where  p is the input.The right inverse can be given as is the desired output after consultation.The necessary i q can be compared with the corresponding transition probability   . If large deviation emerges, the expert i Q may seem not to be convincing.

Case Study
We now present a case study to demonstrate the use of the method.The Swiss Jura data set [14] is used, where four lithology types are sampled in a 14.5 km 2 region.These rock types are Argovian, Kimmeridgian, Sequanian and Quaternary; corresponding class proportions of these four categories are 20.46%,32.82%, 24.32%, 22.39% respectively.We have 259 samples in total for prediction (Figure 1).The first task we need to do is to obtain the expert opinions (i.e., transition probabilities) in spatial scenarios.We use the 10 nearest samples for prediction, thus we always get 10 experts for consultation.The detailed procedures for estimating the transition probability are beyond the scope of this work.One can find the discussions with respect to transiogram fitting in [1,11].We only show the descriptive statistics of transition probabilities in Table 1.
After obtaining the expert opinions, we can use the regression model represented by Equation (2) to estimate the linear weights in spatial classification.Given that multiple neighbors will be involved in spatial scenarios most of the time, the LBU should often be a multivariable linear The first task we need to do is to obtain the expert opinions (i.e., transition probabilities) in spatial scenarios.We use the 10 nearest samples for prediction, thus we always get 10 experts for consultation.The detailed procedures for estimating the transition probability are beyond the scope of this work.
One can find the discussions with respect to transiogram fitting in [1,11].We only show the descriptive statistics of transition probabilities in Table 1.After obtaining the expert opinions, we can use the regression model represented by Equation ( 2) to estimate the linear weights in spatial classification.Given that multiple neighbors will be involved in spatial scenarios most of the time, the LBU should often be a multivariable linear regression model.With the estimated regression coefficients, we can use the maximum a posteriori (MAP) probability criterion for classification [11].
The final prediction results have been shown in Figure 2. We get an overall classification accuracy of 82.63% (214 out of259).To better reflect the prediction accuracy, the precision indicator for each lithoface has also been illustrated in Figure 3.The final prediction results have been shown in Figure 2. We get an overall classification accuracy of 82.63% (214 out of259).To better reflect the prediction accuracy, the precision indicator for each lithoface has also been illustrated in Figure 3.

Conclusions
In this work, we consummate the theoretical foundations of the LBU model for the prediction of categorical spatial data.We have enriched our previous findings [11] by adding some rigorous theoretical proofs of the LBU method.To show how the LBU model can work in spatial settings, a real-world case study has also been carried out.As pointed out by [11], our method can also be generalized to nonlinear systems, where more confident probability forecasting results can be obtained.
In the proposed model, the choice of the size of a neighborhood can be regarded as a variable selection problem.The involvement of more neighboring samples is likely to boost the prediction accuracy for the training set, while it may be computation-intensive and accompanied by a higher generalization error, the so-called overfitting.Challenges to determine the optimal number of neighbors will be the focus of our future works and may be addressed in our upcoming papers.

Conclusions
In this work, we consummate the theoretical foundations of the LBU model for the prediction of categorical spatial data.We have enriched our previous findings [11] by adding some rigorous theoretical proofs of the LBU method.To show how the LBU model can work in spatial settings, a real-world case study has also been carried out.As pointed out by [11], our method can also be generalized to nonlinear systems, where more confident probability forecasting results can be obtained.
In the proposed model, the choice of the size of a neighborhood can be regarded as a variable selection problem.The involvement of more neighboring samples is likely to boost the prediction accuracy for the training set, while it may be computation-intensive and accompanied by a higher generalization error, the so-called overfitting.Challenges to determine the optimal number of neighbors will be the focus of our future works and may be addressed in our upcoming papers.

Figure 1 .
Figure 1.Jura lithology data set with four classes.

Figure 1 .
Figure 1.Jura lithology data set with four classes.

Figure 2 .
Figure 2. Lithofacies prediction of the corresponding 259 locations based on MAP probability criterion.

Figure 2 .
Figure 2. Lithofacies prediction of the corresponding 259 locations based on MAP probability criterion.

Table 1 .
Descriptive statistics of transition probabilities (expert opinions).