3.2. Visualising Change Matrices
A number of tools are available to analyse contingency tables such as correspondence matrices and to depict the results of simple statistical tests. These include agreement plots [
20] and mosaic plots [
21,
22]. Their implementation within the
vcd package in R is described in code snippets accompanying the package and worked examples are given in Meyer
et al. [
23], Zeileis
et al. [
24] and Friendly [
25]. All of the statistical analyses, tables and figures in this paper were implemented in R version 3.2.1, the open source statistical software, using the
vcd and
gplot packages. The data and code used in this analysis will be freely provided to interested researchers on request.
Agreement plots [
20] provide a graphical representation of the diagonal and off-diagonal elements in a correspondence matrix. The agreement plots arising from the correspondence matrices in
Table 1 are shown in
Figure 1. Large off-diagonal values in the matrix are indicated by the areas around the diagonal and their size, orientation and shading indicates the direction of change. A number of statements about the correspondence matrices can be very quickly deduced from
Figure 1. For example, the agreement plot shows:
3.3. Comparing Changes in Two Regions Using Odds Ratios
It is possible to make a number of statements from the regional change matrices in
Table 1 about the probability of change for any given class in any given region. Losses and gains are derived from the row and column marginal totals and diagonals. For example, the probability of Farmland losses are as follows:
The objective in some land cover change studies is to compare changes in different regions, perhaps relating to management, policy or ownership. Probabilities provide useful descriptive statistics of the change but they are not directly comparable in this form as they are specific to each region. Odds ratios provide a widely used technique in land use modelling and assessment, principally to examine the underlying drivers and factors associated with land use but as yet they have not been used to compare regional differences.
Odds ratios can be used to compare any two individual regions or class-to-class changes. They indicate the
relative likelihood of change between different treatments. Thus, they provide a comparative measure of change and can be used to describe regional differences, differences between land cover classes and differences in specific class to class changes observed in two regions. The odds ratio, θ, of the relative likelihood of change is defined as follows:
An odds ratio of 1 indicates change is equally likely to occur in both regions. If it is greater than 1, then this suggests that change is more likely to occur in Region A. If the odds ratio is less than 1, then this indicates that change is less likely in Region A than in Region B and, in this case, the ratio is inverted to describe likelihood of change in Region B relative to Region A.
To determine odds ratios, the diagonal and off-diagonal elements of the change matrices are collapsed into 2 by 2 matrices, which can then be used to calculate the relative odds of changes in one region compared to another. The overall changes in Regions 1 and 2 indicate change in 13 out of 100 pixels in Region 1 and in 26 out of 100 pixels in Region 2. This results in no change totals of 87 and 74 pixels respectively. The relative likelihood of land cover change in Region 1 compared to Region 2 is:
That is, relative odds of change in Region 2 are 0.425−1 or 2.35 times higher than in Region 1. The significance of the interactions between regions and land cover change can be tested using a χ2-test and in this case it indicates a significant difference at the 95% level between Regions 1 and 2 (p-value = 0.032).
It is also possible calculate to the relative odds and associated significance for changes to different classes.
Table 2 shows the relative odds of land cover losses and gains comparing Region 1 with Region 2.
A number of significant differences in land cover change are suggested by
Table 2:
the relative odds of loss from Farmland is 3.7 (0.267−1) times greater in Region 2 than in Region 1;
the relative odds of gains in Farmland and Woodland area are 29.4 (0.034−1) and 6.3 (0.158−1) times greater in Region 2 than in Region 1.
Other losses and gains are not significant.
The gross changes may hide more subtle changes in each class. As a result, the odds ratios may present an example of Simpson’s paradox [
26], where different rates (and directions) of per class changes may be masked by the aggregate gross changes. It is possible to quantify differences in the likelihood of specific class-to-class transitions, rather than just losses and gains, and how they vary in different regions. As an example of a specific direction of change, consider the transitions from Farmland to Urban class in Regions 2 and 3 (
Table 1). The diagonal and off-diagonal elements of the correspondence matrices are collapsed into a 2 by 2 contingency matrix, which is used to generate the relative odds of a specific land cover transition in one region compared to another, as in
Table 3. The odds ratios suggest that the relative odds of Farmland changing to Urban are (0.218
−1) 4.59 times more likely in Region 3 than in Region 2.
Of course, it is important to consider the data that are used to populate the contingency table: by including the areas that did not change as well as those that did, the correct interpretation of this odds ratio above is,
change from Farmland to Urban is 4.6 (0.218−1) times more likely in Region 3 than in Region 3, when all possible states of change and no change are considered, and the χ
2-test showed this to be significant at the 95% confidence level (
p-value = 0.0098). This analysis can be further refined to consider only land use changes (
i.e., without considering areas Farmland that did not change). The data are shown in
Table 4 and the odds ratio now describes a different problem: that
change from Farmland to Urban is 3.8 times more likely in Region 1 than in Region 2, when only observed changes from Farmland are considered, although in this case the differences were not found to be significant (χ
2 p-value = 0.0956) when only changes were considered.
Finally, the analysis can be extended to determine the relative odds of change to and from all possible classes.
Figure 2 shows the odds ratios for each class to class pair, comparing changes in Region 2 with those in Region 3. The table elements are shaded by the significance arising from the χ
2-test.
Figure 2 describes
the relative odds of class-to-class changes in Region 2 compared to Region 3, when only changes from the original class are considered. It is easy to identify significant regional differences (shaded in green) and to make the following statements:
3.4. Comparing Changes in More than Two Regions
The preceding analyses compared only two regions, with data collapsed into 2 by 2 contingency tables. However, in many studies, the objective is to compare more than two treatments and to evaluate differences across multiple factors. Consider, for example, the regional losses and gains in
Table 5. These can be analysed using mosaic plots which provide a method to evaluate and visualise statistical differences in contingency tables (symmetrical and non-symmetrical).
Mosaic plots were proposed by Hartigan and Kleiner [
21] and extended by Friendly [
22]. In these, the significance of the interactions between column and row factors are indicated by the shading, in which the standardised residuals of a log-linear model are indicated by the colour and outline of the mosaic tiles. The mosaic plot in
Figure 3 has axes for the different regions being compared and the land cover change types. The size of the plot tiles is proportionate to the land cover areas (counts in the contingency tables). Their shading indicates whether the combinations of groups, regions, classes
etc. are less or greater than expected under a model of proportionality. In the examples below, tiles shaded deep blue show interactions that are significantly higher than would be expected (
i.e., corresponding to combinations of change and region whose standardized residuals are greater than +4), when compared to a model of proportionally equal levels of change. Tiles shaded deep red correspond to residuals less than −4 indicating significantly lower frequencies than would be expected when compared to the model. The standardized Pearson residuals measure the deviation of each tile from independence. From
Figure 3, statements can be extracted under the assumption of proportionally equal levels of change (loss and gain) for each land cover class and region. In this case, the mosaic plot indicates that the gains to Urban in Region 3 are much greater than expected.
It is possible to apply a different type of analysis to the correspondence matrix in order to compare regional land use changes against a model that expects proportionally equal levels of change in each region. Generalised linear models can be used to estimate the likelihood of change as a function of the regions. The counts of change (loss) and no change are summed for each region in a table of counts. In this, the rows indicate whether change had occurred or not and the columns indicate the region—a transpose of
Table 5. To test for an association,
A, between the row and column effects, the Poisson regression model is applied:
where the count in column
i and row
j is denoted by
cij and has a Poisson distribution,
r is an intercept term,
Ci is a column effect and
Rj is a row effect, which is compared against the model:
where the extra term
Iij is an interaction effect between rows and columns. If this is significantly different from zero, then it suggests that there is some degree of association between the row and column effects. Values of
Iij were estimated by fitting Equation (3) to the regional data and the resulting coefficients were related to a comparative index of loss for each of the row categories, using the formula:
In summary, Equations (2)–(4) apply a generalized linear model to a cross-tabulation of how different factors interact (regions and classes) in order to predict the frequency of occurrence of the count under a Poisson distribution. Note that in the analyses below the
CHANGE term in Equation (4) is used to evaluate land cover losses from Time 1 and gains at Time 2. Due to the way the interaction terms are calibrated, this compares each column
category j (regions) against a “reference” category which is usually the region with largest area. However, in this case all of the regions have the same number of pixels and so the reference is Region 1. A value of 0 suggests the likelihood of loss for
category j is the same as for the reference category. A value of +50 for
category j suggests loss is one-and-a-half times as likely as the reference category, a value of −50 that it is half as likely, and so on. The analysis of loss from a transpose of
Table 5 was calculated and the results are shown in
Table 6.
The results in
Table 6 suggest that the likelihood of change in Region 2 is 135% greater than in Region 1 and that the likelihood of change in Region 3 is 215% greater than in Region 1.
The application of the generalised linear models can be further extended to consider how specific class-to-class transitions vary in different regions. Consider the summary data in
Table 7. This describes the changes from Farmland to Urban and to non-Urban classes (
i.e., Grass and Woodland) in the three regions, ordered left to right by the largest column totals. It is possible to determine the likelihood of change to Urban in different regions relative to the region with the largest area of change, in this case Region 3. The results are shown in
Table 8 and indicate that likelihood of land cover change from Farmland to Urban is 286% greater in Region 2 than in Region 3 and 57% less in Region 1 compared to Region 3, although this difference was not found to be significant.
Finally, it is sometimes useful to be able to compare different land cover transitions. Consider the data in
Table 9. They summarise the changes from Farmland to Urban and to Woodland in the three regions, again ordered left to right by the greatest volume of change. The results are shown in
Table 10 and indicate that likelihood of change from Farmland to Urban rather than from Farmland to Woodland is 200% greater in Region 2 compared to Region 3 and 57% less in Region 1 compared to Region 3, although in this case neither of these differences are significant.