In this paper, an interactive technique for extracting cartographic features from aerial and spatial images is presented. The method is essentially an interactive method of image region segmentation based on pixel grey level and texture information. The underlying segmentation method is seeded region growing. The criterion for growing regions is based on both texture and grey level, where texture is quantified using co-occurrence matrices. The Kullback distance is utilised with co-occurrence matrices in order to describe the image texture, then the Theory of Evidence is applied to merge the information coming from texture and grey level image from the RGB bands. Several results from aerial and spatial images that support the technique are presented

In order to use the huge amount of information available from high-resolution satellite and aerial images more efficiently in cartography, it will be necessary to find methods that detect objects like streets, houses, vegetation and other cartographic features in a fully automatic manner. If this were possible, a lot of work could be done faster and in a more efficient way. Generally, to detect an object in a digital image, the first step is segmentation. It was quickly recognised that cartographic feature extraction is an issue of high complexity, and until now, there has not been any generally satisfactory solution [

The broad utilisation and evolution of Geographic Information Systems (GIS) has increased the need for more rapid update of the cartography layers on which these are built. Today, most of the cost of developing a GIS comes from the construction of its layers, as the work in obtaining vectorial layers is done through digitisation (i.e., drawing manually on orthophotos.) Thus, there is an increasing need for semiautomatic algorithms that would assist with this time-consuming task, and this is the focus of this paper.

In most cases, prior to extracting cartographic features from aerial or space images, the image must be segmented into homogenous regions, which are then merged into higher levels to obtain the cartographic features from which the map can be drawn. The work presented in this paper is confined to the very first step of segmenting the image – not even the whole image, but just a region that could represent a specific cartographic feature or part of a cartographic feature; for example, a lake in an aerial image. The only information that is otherwise provided in the process is the seed point, which is given, for example, by clicking with the computer mouse on any pixel inside the region of interest. Since the objective is to find only the region related to the seed pixel – i.e., the clicked pixel, which represents the connected component – the algorithm can be based on recursive region-growing technique. Another important issue is to decide when a pixel is inside or outside the studied region.

Subsequently, the Theory of Evidence (ToE) was applied in merging information coming from colour and texture. It was also necessary to tune the parameters, given the uncertainty of each source of information. In this way, different regions can be detected by the user in an interactive manner. The region detected also needed some further refinement or editing with the use of mathematical morphology, and some of the results are shown to showcase the potential of the method as a whole.

Using semantic networks or rules, the context could be studied and changes could be made on the regions as produced by the algorithm presented here. Even though the algorithm presented in this paper is the very first step of a larger process, it has value in itself as an interactive procedure, as it could be used in a productive cartographic environment after some experimental validation of the accuracy and efficacy of the method.

The algorithm is applied to different images to obtain results and to study the potential of the method for feature extraction.

Segmentation classically refers to the partitioning of the support of an image into subsets in which the texture is homogeneous and this can be achieved by thresholding. Global thresholding is generally unsatisfactory for several reasons, including presence of shadows, non-uniform illumination, and noise. This has motivated the development of hundreds of methods for image segmentation and several proposed taxonomies. In thresholding, the algorithm tries to get a set of thresholds {_{1},_{2},_{3},…,_{k}_{i}_{i}_{+1} ), i = 0, 1, 2,…, k constitute the i-th region type.(T_{0} and T_{k+1} are taken as minimum and maximum grey values of the image, respectively.) Thresholds may be detected based on histogram information or spatial information. Otsu [

The different structures of histograms reveal that the information in the histograms alone does not allow one to segment an image into different regions. Two completely different images can have the same or similar histograms, but in order to use the histogram for segmentation, it is necessary to reduce the 256 possible grey values to only a few levels or labels. To provide for this reduction, we must determine thresholds that produce labels without losing significant information about the regions. The choice of thresholds is an important factor influencing the quality of the resulting segmentation.

In what follows, the method for minimizing Kullback distance is outlined. We have extended it to consider three thresholds (i.e. four normal distributions), instead of one (i.e. two normal distributions) as is commonly used. Even though, in general, multilevel thresholding could be considered less reliable than its single-threshold counterpart [

Let _{j}_{j}

Let us suppose that the observations come from a mixture of two Gaussian distributions, i.e. where the case N=2. Kullback deined a divergence distance between distributions, which Li and Lee [_{1}_{2}

To understand the meaning of _{0}_{A}_{0}_{A}

Since the first term does not depend upon unknown parameters, only the second term must be minimized; hence, we minimize the information measure

We have extended this idea to the possibility that we may find several thresholds for more than two Gaussian distributions. In this case, the respective means and variances become
_{j}_{j}_{+1}.

When _{1}) will be close to the true mean and variance of the appropriate first distribution. Likewise, the mean and variance estimated from _{j}_{j+1}) will be similar to the true mean and variance of the following distributions. The mean and variance estimated from _{N−1} + 1),…,

When we look closer at

A label image with four grey values, calculated and minimized using the Kullback distance with the histogram, is shown in

The disadvantage in using this method is that, with each new threshold, the computational process increases exponentially 255^{N}, where N, as said above, is the number of Gaussian distributions (or N-1 is the number of thresholds). On the other hand, Gagalowicz [

Texture is an important characteristic for the analysis of many types of images. There are many possibilities for examining macroscopic or microscopic texture. Gagalowicz [

Four matrices were calculated for each of the four directions: 0°, 45°, 90° and 135°. The different directions describe the spatial relationship (or texture) in the different angular relationships. The grey level co-occurrence can be specified in a matrix of relative frequencies _{ij}

Entropy

Contrast

We have not found sufficient improvement in the algorithm using more Haralick features than the two proposed here to justify the additional computational burden; neither could we state that these features are better or worse than the others, due to the small differences observed in the segmented image when other features are employed, depending on the image studied.

Since the information considered most relevant in segmenting the images must be obtained, let us examine how all the different kinds of information were amalgamated to establish a decision criterion for segmenting regions. A straightforward way to classify mixed data is to form for each pixel a vector _{1},…, _{M}_{1} is assigned a label (for example, [0,3]), depending upon which interval between the thresholds the intensity can be founded. In several tests with images, this criterion frequently described the regions in a meaningful way. _{2}, _{3}, _{4} are the intensities of the colour layers red, green and blue. The variable _{5} is the entropy for the co-occurrence matrix for texture of the chosen layer and it is computed from _{5} , then, was the sum of all four entropy values. Further augmenting the distance was not found to be justified by the additional computational burden. _{6} : represents the contrast in texture given by _{5}). The four variables _{7}, _{8}, _{9}, _{10} are computed by the Euclidian distance between the texture of the start pixel sp, which is set by the user (the start pixel will be discussed more in the next section). The actual pixel ap is the pixel for the calculated

^{sp}^{ap}_{7}, _{8}, _{9}, _{10}. The B vector is normalised.

To reach a decision about which pixels are inside or outside the region, we used the ToE [

The essence of the technique in using the ToE is the assignment of a so-called “mass of evidence” μ for various labelling propositions for a pixel. The total mass of evidence available for the allocation of candidate labels for the pixel is unity. For this paper we have only two cases: 1. to belong to the region ω and 2 not to belong to region ϖ; there is also uncertainty _{1} and μ_{2} of information for classifying a pixel x, these could be, for example, the Kulback layer and the entropy feature.

_{AB} be any binary product of the form:

Let us clarify how this is done, by way of a classification example. In our case, we want to know if a pixel is inside or outside a region, so let us start with _{1} , the Kullback variable of image data. The variable _{1} labels pixels as belonging to one of two classes: ω(_{1} value between the seed and the actual pixel, whereas a value of 1 would mean that the pixel values are completely different. However, suppose we are a little uncertain about the labelling process or even the quality of the data itself, so that we are only willing to commit ourselves to classifying the pixel with a 90% level of confidence. Thus, we are about 10% uncertain of the labelling. Using the symbolism of the ToE, the distribution of the unit mass of evidence over the two possible labels and our uncertainty regarding the labelling are expressed in the following way:
_{x1} is used to signify the uncertainty in the labelling. Thus, the mass of evidence assigned to label ω as being correct for the pixel is 0.63.

Let see how the ToE is able to cope with the problem of multi-source data. Suppose a second data source is available for our example: the second value of our B-difference vector, _{2}, has the value 0.8. And now we are about 25% uncertain of the labelling. This means that for any particular pixel, we should suppose the mass of evidence after analysing the second data is

Thus, the second analysis seems to be favouring _{2}, as the correct label for the pixel would signify that the pixel is outside of the region. The ToE now allows the two mass distributions to be merged, in order to combine the evidence and thus come up with a label that is jointly preferred by both sources together, and for which the overall uncertainty should be reduced. This is done through the mechanism of the orthogonal sum,

In order that the final mass distribution sums to unity, a normalising denominator is computed. This denominator is the sum of the areas of all the rectangles with some value (see

After the calculation of the resulting (i.e., combined evidence) mass distribution, the class

For cases using more than two sources (such as our case), the orthogonal sum can be applied repetitively, since the orthogonal sum is both commutative and associative, as said above.

After the orthogonal sum has been applied, the decision as to whether the pixel is inside

There are many image segmentation approaches, such as clustering, boundary detection level-set methods and active contour, region growing, etc. The clustering or characteristic feature threshold, like the popular k-means algorithm, usually does not consider spatial information. Boundary detection achieves good results for simple noise-free images, but their weak point is that they produce noisy, complex images. Edge detection often produces missing edges and even extra edges, which cause detected boundaries to not necessarily form a set of closed, connected curves that surround connected regions. Region growing has the advantage of exploiting spatial information and guarantees the formation of closed, connected regions (due to its very principle).

However, region growing is not without its problems – the main ones being the difficulty in finding the right point to start (i.e., the “seed point”) and in knowing when the region growing process should be terminated. As a result of the latter, in particular, what is generated could be under- or over-segmented. There are several papers that try to solve these problems in region growing, where the researchers worked with several regions that grow at the same time, and then they merged all similar regions together. Tilton [

What is presented in this paper is the development of a region growing algorithm to detect cartographic objects, using only information from one image. This means using only information such as grey values or texture. To segment an area, it is important to have significant differences between the area being examined and the surround it. If such differences are not available, it is quite difficult to segment the area using only information from the image. The algorithm used for region growing is of a recursive type.

Let P be any property of the image – for example, that the grey level is less than 150. The algorithm could be described by the following procedure:

procedure growing(P);

for all the neighbours of P if B(P)<B(neighbour P) then growing (neighbour P)

Next, we will highlight some of the results.

The next example shows the result calculated on a pan-sharpened Ikonos image with a ground resolution of 1 m. The red layer was used for the calculation of the co-occurrence matrices. The road was detected as a region, but the algorithm did not detect the road loop because of the shadow (

Some cartographic objects would be more difficult than others to discern in the extraction process, depending on the stability of colour and texture in the region of interest. For example, the method at hand would detect only the roof part of a building, which has a homogenous area characteristic of the seed pixel. If one is interested only in detection of a specific object (i.e. target detection); in this case, the B vector could optimize the search for this type of object.

There could be some risk of facing heavy computational demands due to the recursive nature of the method; however, this turned out not to be true, and the developed algorithm worked properly in this capacity. It took only a few seconds (on a one gigahertz Pentium PC with a 500 M RAM) to obtain the regions in the images shown in

A new technique for extracting cartographic features from aerial and spatial images has been presented. The ToE has been shown to be an efficient method to fuse information coming from grey level and texture, in order to discriminate whether a pixel is inside or outside a specific region. Several results from real images support the technique. A manual click in an interior point of a region segments the region presented as a binary image. Algorithms for cartographic feature extraction present one important problem; namely, the parameters to be chosen and how to perform its tuning. In our case, the parameters have been the number of thresholds; the sizes of the window, distance and number of directions in the co-occurrence matrices; the choice of co-occurrence matrix features, and finally, the choice of features to form the vector B in Section 4, makes the algorithm an “ad hoc” method; however, experiment have shown that the only one relevant to the final segmentation is the choice of vector B.

Future work will attempt to quantitatively evaluate the performance of the proposed algorithm. One or more references or “ground truth” data sets will be needed (containing the “true” segmentation of the aerial/satellite scenes) so that the performance of the proposed and competing methods can be evaluated. Even though the others are difficult to compare, since they use different kinds of source information, for example LIDAR and multispectral [

This algorithm could be used in today's interactive environments for cartographic feature extraction, although more work should be performed, focussing on the post-processing steps to improve the usefulness of the algorithm's output.

The authors wish to thank the Spanish MCYT for financial support; project number CGL2006-07132/BTE.

Algorithm.

The upper left image is a pansharpened Ikonos image. The upper right figure represents its histogram for the colour red band. The Kullback distance found three threshold levels: 56, 87, and 121. The lower image is the thresholding with the found levels.

Orthogonal sum of the Theory of Evidence.

Searching directions of the recursive region growing algorithm around the actual pixel. The numbers show how the recursive algorithm steps forward, scan the region, on the image.

(a) Aerial image, with the seed point described by the centre of the circle; (b) Result with 0.1 uncertainty; (c) Street and houses at the pixel level, it can be observed how difficult is to differentiate between both cartographic features; and (d) Result for uncertainty 0.01.

Ikonos image region growing. The shadow on the road prevents the process from going into the road loop found in the lower right portion of the picture.