# Bayesian Networks for Raster Data (BayNeRD): Plausible Reasoning from Observations

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Bayesian Networks

_{1}, V

_{n}) to denote both a variable and its corresponding node, and the same but lower-case letters (e.g., v

_{1}, v

_{n}) to denote the state or value (defining a particular instantiation) of the variable. Then, the joint probability distribution for any particular instantiation of all n variables in a BN is given by:

_{i}represents the instantiation of variable V

_{i}and ϕ

_{i}represents the instantiation of its parents Φ

_{i}, with i varying from 1 to n. Parent variables are those whose instantiations directly influence other, descendent variables. The arcs (represented by arrows in the DAG) encode the conditional dependencies (i.e., which variables are parent/descendant of other variables) [9,11]. The joint probability of any instantiation of all the variables in a BN can be computed as the product of only n probabilities. Thus, we can determine any probability of the form:

_{i}are sets of variables with known values (v

_{i}, i.e., instantiated variables). This ability to compute posterior probabilities given some evidence is called inference. In the case of using Equation (2) for inferences about certain phenomena using BayNeRD, we named the variable that represents the phenomenon as the target variable and the variables that can be used to describe an outline of the phenomenon as context variables (i.e., those variables that are somehow related to the phenomenon).

## 3. Framework of the Implemented BayNeRD Algorithm in R Software

#### 3.1. Target Variable

#### 3.2. Context Variables

#### 3.3. Designing the Bayesian Network Graphical Model

#### 3.4. Discretization and Probability Functions

#### 3.5. Computing the Probability Image

_{1}, V

_{2}, … and V

_{n}individually influence the probability computed for Y by computing KL divergences between conditional and marginal probabilities in the BN model.

#### 3.6. Selecting the Target Probability Value

## 4. Case Study of Soybean Mapping in Brazil: Materials and Research Methods

^{2}[32]. Figure 2 shows the location of Mato Grosso State, highlighting thirty 30 × 30 km plots (and the Landsat path/row covering them) of reference data produced by Epiphanio et al. [33] for the crop year 2005/2006 (i.e., from August 2005 to July 2006) using visual interpretation of Landsat-5/TM images and field data. Additional data such as indigenous lands, conservation units, mapped forests and floodplains were used to mask out areas of no interest for mapping soybean (as will be described further).

#### 4.1. Variables

- (1)
- Target variable—soybean occurrence (S) corresponding to the studied phenomenon, represented by a thematic map with four classes for the crop year 2005/2006: (i) target presence observed (i.e., soybean); (ii) target absence observed (i.e., non-soybean); (iii) missing data (i.e., no observations); and (iv) pixels outside the study area. This thematic map, produced by Epiphanio et al. [33], was used as a reference in this study. In the BayNeRD modelling, S = s, where s = 1 for soybean presence and s = 0 for soybean absence. Two thirds of the pixels in each of the thematic class soybean and non-soybean were randomly selected from the reference map to compose the reference data for training. The remaining third of the reference map pixels was set aside to be used for accuracy assessment (reference data for testing).
- (2)
- Context variables—the selected and available variables to compose the model are listed in Table 1. From expert knowledge it is known that each context variable influences soybean occurrence (S).

#### 4.2. Bayesian Network Model

#### 4.3. Discretization and Probability Functions

#### 4.4. PI

## 5. Results and Discussion

#### 5.1. Probability Image (PI)

_{C}= 0.28 and KL

_{L}= 0.16, i.e., the KL divergence for C and L, respectively). It means that, as pointed out by Risso et al. [64], a proper vegetation index taken at key dates over the crop calendar can be used to identify specific crops such as soybean [69]. In fact, due to its ability and practicability to detect soybean areas, CEI is also used to monitor soybean plantations in the Brazilian Amazon Biome in the context of the Soy Moratorium [65,66]. For the remaining context variables A, T, W, and R, the KL divergences were 0.009, 0.002, 0.003, and 0.0001, respectively. This result means that soil type influenced more the calculated probability of soybean presence then terrain slope, water distance, and especially the distance to a road.

_{W}= 0.003), any decrease in the calculated probability of soybean presence is likely to be very small. However if the context variable has a strong relationship with soybean occurrence (for example C, which presented KL

_{C}= 0.28), any unfavorable condition of this variable is likely to decrease soybean probability values substantially. Additionally, the mixing within a pixel size of 250 × 250 m (defined as our nominal spatial resolution), especially over the boundaries of defined discretized intervals, could be noted in Figure 10, which presented both orange and light-green colored pixels surrounding green pixels in the PI.

#### 5.2. Creating Thematic Maps from the PI

## 6. Conclusion

## Acknowledgments

## Conflicts of Interest

**Figure 1.**Directed Acyclic Graph (DAG) representing a hypothetical BN graphical model where the target variable soybean occurrence (S) is influenced by two context variables: terrain slope (T) and soil aptitude (A). Since soil formation processes are strongly influenced by terrain slope, T is also parent of A. Variables are represented by nodes and dependences are represented by arcs between pairwise nodes.

**Figure 2.**Study area corresponding to Mato Grosso State, Brazil. The analysis was only performed in areas that were not masked out.

**Figure 3.**Summary of the procedures used in the case study of applying BayNeRD to identify soybean plantations in Mato Grosso State, Brazil. Table 1 provides a description of the variables used.

**Figure 4.**Directed Acyclic Graph (DAG) encoding assertions of conditional (in)dependence among the variables and representing the designed Bayesian Network graphical model for the case study of soybean occurrence in Mato Grosso.

**Figure 5.**Discretization of context variable terrain slope (T) into three intervals. The percentage at the top of each bar represents the probability of finding a pixel within the defined interval limits, e.g., P(−∞ ≤ T = t < 0.06) = 82.9%; and the percentage at the bottom of each bar represents the conditional probability of soybean presence given the defined interval limits for T, e.g., P(S = 1 | −∞ ≤ T = t < 0.06) = 53.6%.

**Figure 6.**(

**a**) Histogram of context variable CEI value in the last crop year (L); (

**b**) Discretization of L into four intervals. The percentage at the top of each bar represents the probability of finding a pixel within the defined interval limits, e.g., P(0.26 ≤ L = l < +∞) = 7.0%; and the percentage at the bottom of each bar represents the conditional probability of soybean presence given the defined interval limits for L, e.g., P(S = 1 | 0.26 ≤ L = l < +∞) = 95.4%.

**Figure 7.**Histogram of CEI values observed in the current crop year (C) and boxplot showing the strong relationship between soybean presence (S = 1) and C greater than 0.2.

**Figure 9.**Probability Image (PI) of soybean presence for the entire Mato Grosso State, Brazil. Main soybean producer centers and the capital, Cuiabá, are highlighted. The color indicates the calculated probability of soybean presence in 2005/2006 given the observations made for the context variables, as expressed by Equation (6).

**Figure 10.**Probability Image (PI) of soybean presence and six context variables (described in Table 1) zoomed in on the central part of the Sapezal municipality. The legend for the context variables followed the intervals stated in Table 2. Regions labeled 1, 2, and 3 show respectively, ideal, intermediate and poor conditions for soybean cultivation.

**Figure 11.**Receiver Operating Characteristic (ROC) curve, depicting sensitivity and specificity indices associated with thematic maps generated from the Probability Image (PI) by varying the Target Probability Value (TPV) from 0% to 100%. The circle points out the best TPV according to the chosen criterion.

**Figure 12.**Accuracy indices associated with thematic maps generated from the Probability Image (PI) by varying the Target Probability Value (TPV) from 0% to 100%. The vertical line identifies the best TPV, according to the chosen criterion, highlighting the accuracy achieved according to each index (described in the legend).

Variable | Description |
---|---|

C | CEI^{*} value in the Current crop year (2005/2006) |

L | CEI^{*} value in the Last crop year (2004/2005) |

A | Soil Aptitude |

T | Terrain slope (given in %) |

W | Distance to the nearest Water body (given in km) |

R | Distance to the nearest Road (given in km) |

^{*}Crop Enhancement Index [46].

**Table 2.**Summary of the intervals limits defined for each of the six context variables, described in Table 1.

Interval # | C | L | A | T | W | R |
---|---|---|---|---|---|---|

1 | [−∞; 0.05) | [−∞; 0.05) | low | [−∞; 0.06) | [−∞; 0.5) | [−∞; 3.0) |

2 | [0.05; 0.20) | [0.05; 0.20) | high | [0.06; 0.12) | [0.5; 1.0) | [3.0; 8.0) |

3 | [0.20; 0.26) | [0.20; 0.26) | [0.12; +∞) | [1.0; 2.0) | [8.0; +∞) | |

4 | [0.26; +∞) | [0.26; +∞) | [2.0; +∞) | |||

# of intervals | 4 | 4 | 2 | 3 | 4 | 3 |

© 2013 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/3.0/).

