^{1}

^{2}

^{*}

This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Improvement of satellite sensor characteristics motivates the development of new techniques for satellite image classification. Spatial information seems to be critical in classification processes, especially for heterogeneous and complex landscapes such as those observed in the Mediterranean basin. In our study, a spectral classification method of a LANDSAT-5 TM imagery that uses several binomial logistic regression models was developed, evaluated and compared to the familiar parametric maximum likelihood algorithm. The classification approach based on logistic regression modelling was extended to a contextual one by using autocovariates to consider spatial dependencies of every pixel with its neighbours. Finally, the maximum likelihood algorithm was upgraded to contextual by considering typicality, a measure which indicates the strength of class membership. The use of logistic regression for broad-scale land cover classification presented higher overall accuracy (75.61%), although not statistically significant, than the maximum likelihood algorithm (64.23%), even when the latter was refined following a spatial approach based on Mahalanobis distance (66.67%). However, the consideration of the spatial autocovariate in the logistic models significantly improved the fit of the models and increased the overall accuracy from 75.61% to 80.49%.

A major part of research in satellite remote sensing is dedicated to the optimization of computer-aided classification processes for identifying and mapping various land cover/use types [

Multispectral classification approaches that rely only on information extracted from single pixels (known as per-pixel spectral classifiers), allocate each pixel to an output classification class on the basis of a relative similarity (distance) of pixel's vector x to the mean vector of each class derived after user-selected training data. The range of spectral classifiers is extensive, including methods like the k-nearest neighbour [

Recently, progress achieved in the improvement of satellite sensor characteristics has lead to the existence of high resolution scene models (H-resolution) [

In our study, a spectral land cover classification method of a LANDSAT-5 TM image that uses several binomial logistic regression models was developed, evaluated and compared to the familiar parametric Maximum Likelihood (ML) algorithm. The classification approach based on logistic regression modelling was extended to a contextual one by using autocovariates to consider spatial dependencies of every pixel with its neighbours. Autologistic regression modeling has been already introduced in remote sensing studies as a classification approach of satellite data, however it has been limited to binary classification schemes as for instance in flood zonation and burned land mapping [

The study area is located in the uppermost part of the Kassandra Peninsula in NC Greece (

The study area belongs to the Mediterranean type climate and the bioclimate is characterized as semi-arid with severe summer droughts and relatively high humidity throughout the year. The study area is subject to strong human influence and high tourist pressure which justifies the fragmented character of the landscape. The relief of the area is rather gentle with mild slopes resulting in non-severe topographic shading.

A Landsat-5 Thematic Mapper image (path 184; row 032) was acquired on 11 May 1997. Haze removal was applied to the LANDSAT TM image by subtracting the amount by which each band's histogram is shifted from the origin due to atmospheric scattering. Following this dark pixel subtraction approach, the minimum value of each spectral channel was subtracted from each pixel brightness in that channel [

The Landsat TM image was orthorectified using 54 ground control points identified on 1:5,000 scale orthophotographs produced from a 1996-1997 national aerial photography campaign and a digital elevation model constructed from contour lines of 20 m increment. Orthorectification ensured that spatial inaccuracies caused by the irregular terrain would be minimized. The total root mean square error (RMS error) was about 0.6 of a pixel.

Initially, six informational classes (

The Maximum Likelihood classifier is based on the Gaussian estimate of the probability density function for each class. Assuming that the probabilities for all classes are equal, the probability density function is calculated for each class by [_{k}_{i}_{k}_{i}_{i}_{c}_{ci}_{kc}

Finally, each pixel is allocated to the class with the highest probability function or equivalent to the highest posteriori probability of membership obtained by the Bayes' Thorem under non-equal prior probabilities [_{k}_{i}_{k}_{i}_{c}

The classified image by ML algorithm was spatially weighted and reclassified using Mahalanobis distance [

Multiple logistic regression modelling is used to predict a binary dichotomous variable Y from a set of independent explanatory variables by estimating the probability of the event's occurrence. The main assumption made in the logistic regression model is the linear relationship between the natural logarithm of the odds of the binary outcome (let Y take values 1 and 0) and the independent variables. In contrast to other multivariate statistical methods, no assumption of multivariate normality has to be satisfied.

Logistic regression may be proved useful for the classification of satellite remotely sensed data, especially when the independent variables (spectral observations) do not follow the normal distribution. The main consideration for implementing the logistic regression modelling into classification process is to express the classification problem in a binary dichotomous way, i.e. to consider the classification categories by two each time [_{o}_{k}

The flowchart of classification using logistic regression modelling is presented in

Assessment of training areas for each informational class and extraction of DN values. The spectral channels of TM imagery are perceived as independent variables while the land cover category is the dependent variable.

T groups (

Using a forward multiple logistic procedure based on the likelihood ratio statistic, the coefficients of each model are estimated by considering three explanatory variables maximum. The independent variables of each model are the best-performing out of the seven available to discriminate each informational class.

The logistic regression models are applied and

The final classified image results by assigning to each pixel the land-cover category which corresponds to the highest probability value.

The autologistic regression model, which results after the addition of an autocovariate component to an ordinary logistic model, provides the opportunity to integrate spatial information into modelling. The integration of the autocovariate is based on the principle that adjacent pixels are more likely to belong to the same class. Therefore the probability of a candidate pixel to belong to a certain class, apart from considering the spectral information, also depends on whether the neighbouring pixels belong also to the specific class [

The procedure includes the following steps [

Estimation of the predicted probabilities of the binary response variable using the ordinary multiple logistic regression model.

Estimation of the autocovariate component from the predicted probabilities using a moving window. The autocovariate component is then incorporated into the ordinary multiple logistic regression model as a new covariate.

Estimation of the coefficients of the autologistic multiple regression model including the original covariates (three spectral channels) and the autocovariate component. The procedure can be repeated from step 2 using the estimated probabilities of step 3.

The formula of the autologistic regression model, based on the equation of the ordinary logistic regression for a grid of cells is the following:
_{kz}_{k}_{g}_{k}_{i}_{k}z_{k}_{g}_{k}_{i}_{k}_{i}_{k}_{g}

In our study development of the autologistic regression models was made by maintaining the same covariates (spectral bands) as the original logistic models, in order to test their relative significance and validity. Only eight nearest neighbours were considered and an inverse distance weighting was applied.

A stratified random sampling procedure was adopted to select a total of 123 points that were used to estimate the accuracy of the classification results. The majority of the reference samples were located by field survey, while areas difficult to visit were located through photo-interpretation of the available orthophotographs. Overall and individual per class accuracy (users and producers) and the Kappa coefficient of agreement were estimated.

A pairwise test statistic Z was also applied on the Kappa coefficient of agreement to statistically compare the results of the four classification schemes [

Furthermore, classification performance of the maximum likelihood and logistic regression was assessed by considering the standardized probabilities based on which the pixels were assigned to a specific class [_{i}_{k}_{i}_{i}_{i}

Finally, two landscape pattern metrics, the Mean Patch Size (MPS), a common fragmentation index [

Hierarchical cluster analysis results are presented in

When the stands are sparse and the foliage coverage is not dense, the reflectance of the broadleaved species of the understory contributes significantly to the total reflectance of the pixel and results in spectral responses similar to areas dominated by shrubs in the overstory [

The overall accuracy of the image classified by the ML algorithm was 64.23%, while the accuracy of the image classified by multiple logistic regression was 75.61% (

As observed in

Individual class accuracies were low for the classes “artificial surfaces” and “barren”, especially in the maximum likelihood based approaches. As it can be observed in

The use of the typicality measure as a contextual refinement to ML algorithm improved the overall classification accuracy (66.67%) and the Kappa coefficient of agreement (0.59). The contextual process, which creates and uses information from the pre-defined neighboring pixels, estimates new probabilities for each pixel. While pixels may have been classified to a certain class with the highest posteriori probability of membership, the contextual approach improves classification results by assigning new classes to pixels that do not belong to this class using spatial information. Therefore, uncertainties occur in the original classification are expected to be reduced.

Similarly, the consideration of the spatial autocovariate in the logistic models (

In addition, a pairwise test statistic was applied to statistically compare the results of the four classification schemes based on the Kappa coefficient of agreement (

The improvement of classification accuracy after incorporating the autocovariate in the original logistic classification approach and the post classification in maximum likelihood is justified by the lower number of polygons which results after the vectorization of the classified images (

Finally, landscape metrics estimated for the classification results (

The use of multiple logistic regression for broad-scale land cover classification proved to be more efficient than the well-known classification algorithm of maximum likelihood even when the latter was refined following a spatial approach based on Mahalanobis distance. The accuracy achieved by the multiple logistic regression approach (75.61%) confirms the possible use of this statistical technique in the classification of broad-scale land cover types using remotely sensed data. Parametric classifiers, such as ML, which presupposes normal distributions, can be insufficient and limited especially in cases where data present multimodal and not normal distributions. In such cases those traditional methods can be substituted by other statistical methods that do not require those assumptions, such as logistic regression. The laboursome and time demanding method of logistic regression can be overcome by the integration of a built-in routine to commercial image processing software.

The extension of the logistic approach to an autologistic one was very successful since it reduced the number of polygons resulted from the classified image and improved the overall accuracy (80.49%). Several classification methods which utilize only the spectral information of satellite sensor imagery are in certain cases insufficient due to spectral similarities of classes. In such cases, spatial information may be useful when considered to increase the limited spectral separability.

Location of the study area and colour composite (RGB: TM-743) of the Landsat TM image used in the study.

Flowchart of the classification approach based on logistic regression. Following the extraction of the training areas,

Hierarchical cluster analyses of the training areas originally delineated in the Landsat TM image. Shrublands and forests are clustered together in a very short distance, indicating spectral similarities which may create spectral confusion and misclassification.

Descriptive statistics of the final classification scheme. Vertical bars extent 1 standard deviation around the mean.

Observed frequency distributions and Kolmogorov-Smirnov values d calculated for each class. None of them was found to be normally distributed at 95 % confidence level according to the estimated d value.

Land cover map using the autologistic regression modeling approach. Grid numbers in meters are in Greek grid projection.

Frequency distribution of polygon size resulting after vectorization of the classified images.

Landscape pattern metrics of the classified images of the four classification approaches.

Land cover types in the study area and corresponding training areas.

Artificial surfaces | Urban areas and man-made structures (roads, camps) | 6 / 572 |

Forest | Coniferous forests ( |
6 / 632 |

Shrubs | Shrublands mixed with interspersed |
5 / 575 |

Grass | Cultivated crops and pastures which at the time of image acquisition, due to the vegetation phenology and the area's climatic conditions, are in full bloom | 6 / 812 |

Barren | Bare rocks, very sparsely vegetated areas, and non-cultivated farmlands | 5 / 962 |

Water | Wetlands and sea | 5 / 732 |

Accuracy measures of the four classification methods, reference points and area extent of each land cover type of the final map resulted from the autologistic regression modeling approach.

| |||||||||
---|---|---|---|---|---|---|---|---|---|

Area of the map (km^{2})/Reference points |
Producers | Users | Producers | Users | Producers | Users | Producers | Users | |

Artificial surfaces | 37.9/13 | 100.00 | 27.08 | 100.00 | 28.26 | 38.46 | 55.56 | 46.15 | 40.00 |

Forest | 48/29 | 82.76 | 88.89 | 89.66 | 89.66 | 89.66 | 86.67 | 86.21 | 96.15 |

Grass | 93.3/32 | 62.50 | 83.33 | 68.75 | 88.00 | 75.00 | 75.00 | 87.50 | 82.35 |

Water | 817.7/16 | 93.75 | 100.0 | 93.75 | 100.0 | 100.00 | 100.0 | 100.00 | 100.0 |

Barren | 97/33 | 21.21 | 77.78 | 18.18 | 75.00 | 66.67 | 61.11 | 72.73 | 75.00 |

Overall accuracy | 64.23 | 66.67 | 75.61 | 80.49 | |||||

0.56 | 0.59 | 0.68 | 0.75 |

Posterior probabilities of the classified images estimated by the logistic regression and the maximum likelihood algorithm.

Probabilities threshold | Number of pixels | Percent (%) | Cumulative percent (%) | Number of pixels | Percent (%) | Cumulative percent (%) |

0,1 | 32556 | 2.68 | 2.68 | 1514 | 0.12 | 0.12 |

0,2 | 35805 | 0.27 | 2.95 | 1514 | 0 | 0.12 |

0,3 | 39052 | 0.27 | 3.21 | 1514 | 0 | 0.12 |

0,4 | 42838 | 0.31 | 3.52 | 1716 | 0.02 | 0.14 |

0,5 | 47126 | 0.35 | 3.88 | 3780 | 0.17 | 0.31 |

0,6 | 52670 | 0.46 | 4.33 | 20393 | 1.37 | 1.68 |

0,7 | 59719 | 0.58 | 4.91 | 39632 | 1.58 | 3.26 |

0,8 | 69963 | 0.84 | 5.76 | 62503 | 1.88 | 5.14 |

0,9 | 90133 | 1.66 | 7.41 | 98830 | 2.99 | 8.13 |

1,0 | 1215609 | 92.59 | 100.00 | 1215609 | 91.87 | 100.00 |

Significance matrix of the four classification approaches. Shaded cells indicate statistical significant differences at 95% confidence level.

Maximum Likelihood (ML) | Logistic | Autologistic | |
---|---|---|---|

1.58 | |||

2.55 | 0.93 | ||

0.40 | 1.28 | 2.30 |