Multiple Artiﬁcial Neural Networks with Interaction Noise for Estimation of Spatial Categorical Variables

: This paper presents a multiple artiﬁcial neural networks (MANN) method with interaction noise for estimating the occurrence probabilities of different classes at any site in space. The MANN consists of several independent artiﬁcial neural networks, the number of which is determined by the neighbors around the target location. In the proposed algorithm, the conditional or pre-posterior (multi-point) probabilities are viewed as output nodes, which can be estimated by weighted combinations of input nodes: two-point transition probabilities. The occurrence probability of a certain class at a certain location can be easily computed by the product of output probabilities using Bayes’ theorem. Spatial interaction or redundancy information can be measured in the form of interaction noises. Prediction results show that the method of MANN with interaction noise has a higher classiﬁcation accuracy than the traditional Markov chain random ﬁelds (MCRF) model and can successfully preserve small-scale features.


Introduction
Categorical spatial data, such as area-class maps and remote sensing images, are very common and important information sources in geographic information science [1].Many approaches have been developed to model the uncertainty in categorical random fields, including but not limited to Markov chain random fields (MCRF) [2,3], the multinomial logistic mixed model [4], the spatial hidden Markov chain [5] and the Bayesian updating model [6].Artificial neural networks (ANNs) [7] have also been extensively used as non-linear and semi-parametric pattern classifiers and function estimators (e.g., classification and regression tools) in geographical information systems and regional science since the 1990s [8][9][10][11].Civco [12] described the application of ANNs to the problem of deriving land-cover information from Landsat satellite Thematic Mapper digital imagery.Skabar [13] reported on the application of ANNs to mapping reef gold mineralization potential and showed that the ability of ANNs to predict the presence of hold-out test deposits is significantly better than that for logistic regression.Conceptually, ANNs are well suited to processing noisy data and handling non-linear modeling tasks [14], and they seem to offer methodological advantages over traditional spatial analysis methods in that ANNs contain no critical assumptions about the nature of spatial data.The analysis of a priori knowledge and data on known physical constraints can also be incorporated into ANNs' interpolation process [15,16], while traditional interpolation methods such as kriging seem to lack of the flexibility to incorporate important general and case-specific sources.
The idea of our multiple artificial neural networks (MANN) model with interaction noise is to apply a linear function to the input nodes, i.e., transition probabilities (two-point), and regard the pre-posterior probabilities (multi-point) as derived features via nonlinear transformation.We need to construct multiple ANNs to obtain the target multi-point posterior probabilities by employing Bayes' theorem.The interaction noises are used for estimating the multi-point pre-posterior probabilities in the MANN training processes.The originality of the proposed solution lies in that no additional assumptions are made about the relationship between the multi-point pre-posterior probabilities and two-point transition probabilities, which takes a step forward with respect to the existing methods.In addition, we have also proposed a method for measuring the spatial interaction effects.The remainder of this paper is organized as follows.We give an overview and formulate the method of MANN with interaction noise in Section 2. To demonstrate our model, an artificial and a real-word case study have been carried out in Section 3. Conclusions and directions for future research are presented in Section 4.

Method
Consider a categorical random variable (RV) C (x 0 ) which can take one out of K mutually exclusive and collectively exhaustive states c(x 0 ) ∈ {1, • • • , K} at any arbitrary location with coordinate x 0 ; when there is no other information, we assume that the probability mass function (PMF) of RV C (x 0 ) is stationary and can be approximated by the K global class proportions π 1 , • • • , π K .A central task in prediction and simulation of categorical random fields is the estimation of the conditional PMF of C (x 0 ) in the presence of observed class labels c (x Our task can now be re-stated as that of estimating the conditional PMF P {C To simplify notation, we use A and D 1 , • • • , D N to represent the events in sample spaces of C (x 0 ) and C (x 1 ) , • • • , C (x N ), respectively, and A to denote the complementary event of A. ANNs offer an approach to geostatistical simulation with the possibility of automatic recognition of the correlation structure.We consider f i = P D j A , j ∈ {1, 2, • • • , i} as the input node, and thus the pre-posterior (multi-point) probabilities can be regarded as the desired output node.To obtain the target (multi-point) posterior probability P (A|D The continuously differentiable sigmoid functions are the most common form of activation function in ANNs [17], and the learning process is achieved by using feed forward and back propagation algorithms.Let ŷi = P (D i |AD 1 • • • D i−1 ) be the actual output obtained on one particular iteration after f i has been fed forward through the network.The current total error, which is defined as a function of the connection weights, is: where W is the matrix of connection weights.
In most traditional neural networks, the desired output nodes are considered as known response variables.Thus, the central work of an ANN is to train the appropriate weights in hidden layers.However, the desired outputs our model cannot be obtained from the available training set directly.We use MANN with interaction noise to solve this problem.Just as MCRF, Tau (τ) model [18] and Nu (υ) expression [19] where ln denotes the natural logarithm, ε i is a random noise which capsules the interaction information from D 1 to D i .Spearman correlation coefficients are used to measure the interdependence from P (D 1 |A) to P (D i |A).In this method, the one that has the most significant correlation with P (D i |A) is chosen to be the input node.Thus, a neural network can be represented in terms of nested transformations of linear combinations of input units, where the initial input units are the predictor variables themselves, so that the overall network function with H hidden units and activation functions φ and ψ becomes where ω (1) h and ω (2 h indicate the weights in the inner and outer layer, respectively, and the bias parameters are absorbed into the set of weight parameters by defining an additional input variable P (D 0 |A) whose value is clamped at P (D 0 |A) = 1.Using Bayes' theorem (or the definition of conditional probability), we can decompose the conditional PMF as follows where x 1 is the nearest neighbor of x 0 among the N locations, l 1 , • • • l N ∈ {1, 2, • • • , K}.Therefore, we can compute the posterior occurrence probabilities of different classes according to Equation (4), and then estimate the category at the target location based on the "maximum a posterior" criterion.According to the above description, the proposed approach can be coded easily on the computer and the flowchart can be summarized as in Figure 1.

Synthetic Case Study
We first illustrate our method on a synthetic case study with three categories (Figure 2).There are 606 locations that have been randomly sampled from this image.The prediction is conducted over a 64 × 64 grid covering the square.The marginal probability of the occurrence vector is (0.2756, 0.4653, 0.2591).The categories may represent geological facies, soil types, land uses or any other categorical variable [20].

 and compute the maximum correlation coefficients ij r
Generate random noise i ε with different parameters Compute ( ) Train the ANNs using Equation ( 3) Choose ( ) Use the MANN for spatial prediction via Equation ( 4 One of the commonly used methods to compute the multi-point posterior PMF is the MCRF model.The highlight of this theory is that there is only a single spatial Markov chain that moves (or jumps) in a space, which avoids the small-class underestimation problem in modeling the categorical random field [2].Nevertheless, the multi-point probabilities Hence, the general solution of the MCRF model is deduced based on the conditional independence assumption, which is By plugging Equations ( 3) and ( 5) into Equation ( 4), the posterior PMF can be determined.Note that P (A|D 1 ) is the two-point transition probability and can be estimated from paired samples.Additionally, P (D 1 ) and P (D 1 D 2 • • • D N ) are the normalization constants and should be computed using the sum rule, since they do not contain the unknown event A.

Synthetic Case Study
We first illustrate our method on a synthetic case study with three categories (Figure 2).There are 606 locations that have been randomly sampled from this image.The prediction is conducted over a 64 × 64 grid covering the square.The marginal probability of the occurrence vector is (0.2756, 0.4653, 0.2591).The categories may represent geological facies, soil types, land uses or any other categorical variable [20].As a first task, the data set is used to estimate the two-point input transition probabilities.Experimental transition probabilities can be directly estimated from paired sampling data by counting the transition frequencies of classes with different lags [21].A continuous transiogram [22] can be obtained via interpolation [23] or fitting the transition probabilities cloud with various models.Following [6], the exponential model is employed for transiogram fitting.After all these preparations, the input vectors of the ANNs can be determined.
We consider the four nearest sampling points as the neighbors of the unknown location.To use Equation (3), we must train three compositional ANNs to compute the pre-posterior (multi-point) probabilities.The problem here is the choice of the input nodes.By analyzing the correlation plot (Figure 3) of these four transition probabilities, no significant correlation is found among the input vectors; therefore, we choose ( ) A as the input of ( ) , respectively.As a first task, the data set is used to estimate the two-point input transition probabilities.Experimental transition probabilities can be directly estimated from paired sampling data by counting the transition frequencies of classes with different lags [21].A continuous transiogram [22] can be obtained via interpolation [23] or fitting the transition probabilities cloud with various models.Following [6], the exponential model is employed for transiogram fitting.After all these preparations, the input vectors of the ANNs can be determined.
We consider the four nearest sampling points as the neighbors of the unknown location.To use Equation (3), we must train three compositional ANNs to compute the pre-posterior (multi-point) probabilities.The problem here is the choice of the input nodes.By analyzing the correlation plot (Figure 3) of these four transition probabilities, no significant correlation is found among the input vectors; therefore, we choose P (D i |A) as the input of As a second task, the Gaussian-distributed interaction noises with different means and variances are simulated.For the 606 training data set, the output nodes P (D i |AD 1 • • • D i−1 ) can be computed via Equation (2).For comparison purposes, we use 10 groups of parameters for estimating and simulating the spatial categorical data in the remaining 3490 locations.We find in Table 1 that when the parameters are specified by µ = −3 and σ = 1.2, the prediction has the highest overall classification accuracy with 56.45% (1970 out of 3490).From the interpolation map generated by MANN with interaction noise (Figure 4), we find that clear inter-class boundaries have been shaped and spatial patch patterns have been retained.To demonstrate the superiority in terms of accuracy, MCRF, as a competing method, has been used for comparison.This counterpart, however, only correctly classifies 1936 samples out of 3490 locations.Based on this result, MANN with interaction noise performs better than MCRF in terms of prediction accuracy in this artificial data set.As a first task, the data set is used to estimate the two-point input transition probabilities.Experimental transition probabilities can be directly estimated from paired sampling data by counting the transition frequencies of classes with different lags [21].A continuous transiogram [22] can be obtained via interpolation [23] or fitting the transition probabilities cloud with various models.Following [6], the exponential model is employed for transiogram fitting.After all these preparations, the input vectors of the ANNs can be determined.
We consider the four nearest sampling points as the neighbors of the unknown location.To use Equation (3), we must train three compositional ANNs to compute the pre-posterior (multi-point) probabilities.The problem here is the choice of the input nodes.By analyzing the correlation plot (Figure 3) of these four transition probabilities, no significant correlation is found among the input vectors; therefore, we choose ( ) A as the input of ( ) , respectively.As a second task, the Gaussian-distributed interaction noises with different means and variances are simulated.For the 606 training data set, the output nodes ( ) computed via Equation (2).For comparison purposes, we use 10 groups of parameters for estimating and simulating the spatial categorical data in the remaining 3490 locations.We find in Table 1 that when the parameters are specified by 3 , the prediction has the highest overall classification accuracy with 56.45% (1970 out of 3490).From the interpolation map generated by MANN with interaction noise (Figure 4), we find that clear inter-class boundaries have been shaped

Real-World Case Study
To further investigate the performance of the proposed method in real cases, the lithology types in the well-known Swiss Jura data set [24] are used, in which five rock types are sampled at 359 locations in a 14.5 km 2 area (Figure 5).The class proportions of these five categories are (0.2046, 0.3282, 0.2432, 0.0116, 0.2124), respectively.We focus on MANN with sigmoid activation functions and consider the 10 nearest samples as the neighbors of the target point.For the proposed method, we choose the one that has the maximum correlation with output probabilities as the input and we have trained nine compositional ANNs.

Real-World Case Study
To further investigate the performance of the proposed method in real cases, the lithology types in the well-known Swiss Jura data set [24] are used, in which five rock types are sampled at 359 locations in a 14.5 km 2 area (Figure 5).The class proportions of these five categories are (0.2046, 0.3282, 0.2432, 0.0116, 0.2124), respectively.We focus on MANN with sigmoid activation functions and consider the 10 nearest samples as the neighbors of the target point.For the proposed method, we choose the one that has the maximum correlation with output probabilities as the input and we have trained nine compositional ANNs.We furthermore assume that the random interaction noise ( ) , , , ; that is, ( ) ( ) Correlation coefficients between various transition probabilities are compared with each other and the maximum correlation pairs are shown in Figure 6, where the red pictures (Figure 6d,h In general, the input nodes of ( ) , should be ( )  We furthermore assume that the random interaction noise ε = (ε 1 , ε 2 , • • • , ε 9 ) is a Gaussian random field.By trial and error, we find that the model which has the highest classification accuracy can be specified by mean µ = 0 and standard deviation σ = 0.1; that is, Correlation coefficients between various transition probabilities are compared with each other and the maximum correlation pairs are shown in Figure 6, where the red pictures (Figure 6d,h) are worthy of discussion.Note that the correlation between P (D 4 |A) and P (D 5 |A) is quite weak with r = 0.2364, and the relevance from P (D 1 |A) to P (D 3 |A) with P (D 5 |A) is even lower.In such a case, we choose P (D 5 |A) itself as the input node of P (D 5 |AD 1 • • • D 4 ), since no significant interaction information is found from D 1 to D 5 .In the same way, P (D 9 |A) is considered as the input node of P (D 9 |AD 1 • • • D 8 ).It is rather remarkable that the maximum correlation pairs may not always be P (D i |A) and P (D i−1 |A), just as Figure 6g displays.P (D 6 |A) rather than P (D 7 |A) is chosen as the input node for output P (D 8 |AD 1 • • • D 7 ), which means that the main interaction information among D 1 , D 2 , • • • D 8 is encapsulated in D 6 and D 8 .In general, the input nodes of , should be P (D i−1 |A) if the correlation coefficients between P (D i−1 |A) and P (D i |A) exceed 0.5.The R package "nnet" designed under the framework of feed forward and back propagation algorithms is used during the training process.We consider three units in the hidden layer.The training weights of MANN with interaction noise are given in Table 2.As can be seen from Figure 7, the nine ANNs converge after a certain number of iterations.The training error of MANN with interaction noise can be viewed as the summation of the nine ANNs, which is illustrated in Figure 7 by different colors and line styles.Our original image (Figure 8a) is obtained from the "jura.grid"data set kindly provided by R software 3.2.5, and the known 5957 grid values are used as the validation set.We find that MANN with interaction noise has a classification accuracy of 65.12% (3879 out of 5957) and can successfully preserve small-scale Quaternary patch features (Figure 8b).We have also used the MCRF method for comparison.Not surprisingly, the latter yields a relatively lower classification accuracy not only in overall prediction, but also in most of the single categories (Table 3).This result is comprehensible, since the general solution of MCRF is based on the conditional independence assumption, which fails to measure the interaction or redundancy information among the neighboring sources.The important knowledge, however, has been successfully incorporated in the form of interaction noise in the proposed MANN model.As can be seen from Figure 7, the nine ANNs converge after a certain number of iterations.The training error of MANN with interaction noise can be viewed as the summation of the nine ANNs, which is illustrated in Figure 7 by different colors and line styles.Our original image (Figure 8a) is obtained from the "jura.grid"data set kindly provided by R software 3.2.5, and the known 5957 grid values are used as the validation set.We find that MANN with interaction noise has a classification accuracy of 65.12% (3879 out of 5957) and can successfully preserve small-scale Quaternary patch features (Figure 8b).We have also used the MCRF method for comparison.Not surprisingly, the latter yields a relatively lower classification accuracy not only in overall prediction, but also in most of the single categories (Table 3).This result is comprehensible, since the general solution of MCRF is based on the conditional independence assumption, which fails to measure the interaction or redundancy information among the neighboring sources.The important knowledge, however, has been successfully incorporated in the form of interaction noise in the proposed MANN model.

Conclusions
In this paper, a new prediction/interpolation algorithm, MANN with interaction noise, is proposed for modeling and reproducing spatial patterns in categorical random fields.Connection weights can be estimated in the framework of the feed forward and back propagation method.Our

Conclusions
In this paper, a new prediction/interpolation algorithm, MANN with interaction noise, is proposed for modeling and reproducing spatial patterns in categorical random fields.Connection weights can be estimated in the framework of the feed forward and back propagation method.Our

Figure 1 .
Figure 1.Flowchart of the proposed method.

Figure 2 .
Figure 2. Reference map with dimension 64 × 64 and three categories.

Figure 2 .
Figure 2. Reference map with dimension 64 × 64 and three categories.

Figure 2 .
Figure 2. Reference map with dimension 64 × 64 and three categories.

Figure 4 .
Figure 4. Prediction map generated by MANN with interaction noise.
random field.By trial and error, we find that the model which has the highest classification accuracy can be specified by mean 0 μ = and standard deviation 0.1 σ = ) are worthy of discussion.Note that the correlation between ( ) exceed 0.5.The R package "nnet" designed under the framework of feed forward and back propagation algorithms is used during the training process.We consider three units in the hidden layer.The training weights of MANN with interaction noise are given in Table2.

Figure 6 .
Figure 6.Maximum correlation coefficients between various transition probabilities.

Figure 7 .Figure 8 .
Figure 7. Training errors in the iteration process of MANN with interaction noise; different colors and line styles are used to distinguish the nine compositional ANNs.

7 .
Training errors in the iteration process of MANN with interaction noise; different colors and line styles are used to distinguish the nine compositional ANNs.Algorithms 2016,9, 56

Figure 7 .Figure 8 .
Figure 7. Training errors in the iteration process of MANN with interaction noise; different colors and line styles are used to distinguish the nine compositional ANNs.

Figure 8 .
Figure 8.(a) Original image obtained from R software data set (jura.grid);(b) Classification maps generated by MANN with interaction noise.Simulations are conditioned on 359 samples in Jura lithology data set, note the small Quaternary patches (visualized in black) neglected in (a) are successfully recovered in the result of the proposed method (b).

Table 1 .
Prediction accuracy comparison of the raster data set.μ and σ denote the mean and standard deviation, respectively.

Table 1 .
Prediction accuracy comparison of the raster data set.µ and σ denote the mean and standard deviation, respectively.

Table 2 .
Training weights of MANN with interaction noise.Input node, hidden nodes and output node are represented by i, h1 to h3 and o, respectively; b1 and b2 are the bias nodes in the hidden layers.

Table 2 .
Training weights of MANN with interaction noise.Input node, hidden nodes and output node are represented by i, h1 to h3 and o, respectively; b1 and b2 are the bias nodes in the hidden layers.

Table 3 .
Prediction accuracy comparison of the Jura data set.

Table 3 .
Prediction accuracy comparison of the Jura data set.

Table 3 .
Prediction accuracy comparison of the Jura data set.