Alike Scene Retrieval from Land-Cover Products Based on the Label Co-Occurrence Matrix (LCM) †

: The management and application of remotely sensed data has become much more difﬁcult due to the dramatically growing volume of remotely sensed imagery. To address this issue, content-based image retrieval (CBIR) has been applied to remote sensing image retrieval for information mining. As a consequence of the growing volume of remotely sensed imagery, the number of different types of image-derived products (such as land use/land cover (LULC) databases) is also increasing rapidly. Nevertheless, only a few studies have addressed the exploration and information mining of these products. In this letter, for the sake of making the most use of the LULC map, we propose an approach for the retrieval of alike scenes from it. Based on the proposed approach, we design a content-based map retrieval (CBMR) system for LULC. The main contributions of our work are listed below. Firstly, the proposed system can allow the user to select a region of interest as the reference scene with variable shape and size. In contrast, in the traditional CBIR/CBMR systems, the region of interest is usually of a ﬁxed size, which is equal to the size of the analysis window for extracting features. In addition, the user can acquire various retrieval results by specifying the corresponding parameters. Finally, by combining the signatures in the base signature library, the user can acquire the retrieval result faster .


Introduction
Alongside the rapid development of remote sensing platforms and sensors, the volume of remotely sensed imagery has also tremendously increased. Because of the large data volumes, the exploration and information mining from remote sensing archives is becoming increasingly difficult. Content-based image retrieval (CBIR) is one of the most active research areas in computer vision. CBIR is also widely used in remote sensing image retrieval, as it does not require the presence of semantic tags, which are rarely available and expensive to assign. In order to improve the situation of the low degree of utilization of the existing large archives of remotely sensed imagery, many studies [1][2][3][4][5][6] have addressed the application of CBIR to remotely sensed imagery. Li et al. [7] proposed a remote sensing image retrieval approach by adopting convolutional neural networks to extract unsupervised features. Demir and Bruzzone [8,9] introduced the hashing methods for large-scale remote sensing (RS) retrieval problems to provide highly time-efficient and accurate search capability within huge data archives. Aptoula [10,11] applied global morphological texture descriptors to the problem of content-based remote sensing image retrieval. Local description strategies and visual vocabularies were widely adopted by remote sensing content-based retrieval and scene classification [12][13][14][15][16][17][18][19][20].

Methodology
This section provides a summary of the proposed approach for the retrieval of alike land-cover scenes from the NLCD 2006 database [31,32] (see Figure 1), which is shown by a flow chart in Figure 2. NLCD 2006 is a 16-class land cover classification product that covers the United States at a spatial resolution of 30 m.
The process can be divided into two main parts: (1) the off-line building of the signature library from the original LULC database; and (2) computation of the similarity between the selected reference scene and the scenes in the database with a given size (width, height, and offset). Constructing the signature library is the foundation of the retrieval system.

Label Co-Occurrence Matrix
The scene signature that we propose is inspired by the grey level co-occurrence matrix (GLCM) [35], and is utilized to extract the texture feature of the image. The principle of the GLCM has also been used in other applications and data. Barnsley et al. [36] designed a kernel-based procedure referred to as SPARK (spatial reclassification kernel) to examine both the frequency and the spatial arrangement of the class (land-cover) labels within a square kernel. To improve the classification of buildings, Zhang et al. [37] adopted modified co-occurrence matrix-based filtering. In the method proposed in [38], pairs of neighboring cells are regarded as primitive features. The primitive features are then extracted and counted to form a histogram to represent the pattern of a tile. The GLCM has also been used to describe landscape structure [39]. In this letter, based on the same idea, we propose the label co-occurrence matrix for a LULC scene, which is defined as: where card{A} represents the number of elements contained in set A, and i and j correspond to the labels of pixel (x, y) and (x + σ x , y + σ y ) in the scene. For a scene I, the co-occurrence matrices are computed at four orientations: where d is the distance parameter, which we set to 1. It can be seen that the LCM contains not only the first-order statistics of the labels in a scene (e.g., the histogram), but also the second-order statistics (i.e., the spatial relationships between adjacent pixels). In order to have a (relatively) rotation-invariant feature (because orientation information is not important in remote sensing), we sum the four co-occurrence matrices. For a scene I, its LCM should then be: where n is the amount of labels of the given scene, and H is a symmetrical matrix of which the dimension is n. Due to the symmetry of H we can only use matrix elements of the lower triangle to form a vector V (the dimension of V is n * (n + 1)/2).

Similarity Measure
A number of different similarity measurements, including the Manhattan distance, the Hausdorff distance, the Mahalanobis distance, Kullback-Leibler divergence [40], Jeffrey divergence [41], etc., have been proposed for CBIR. In this letter, we chose cosine similarity for the similarity measure to be always within the range of 0 to 1 due to the elements in the signature vectors being positive. The scenes are described by a signature vector computed by the LCM. Given two scenes k and l, their corresponding signature vectors are F k and F l , computed using Equation (9). The similarity between the two scenes is defined as: 2.3. Determining the Signature of the Candidate Scene by Combining the Signatures in the Base Signature Library In contrast to the traditional retrieval systems (for which the reference scene can only be of fixed size), we enable the user to select the reference scene more flexibly (see Figure 3b). The sizes of the retrieved scenes can also be varied according to the requirement of the user during each retrieval.  The ability to return scenes with varied sizes relies on the computation of the scene signature. The construction process will be explained below in detail. Given that the database is a land cover scene in size, with width and height both 100 pixels. We first divided the whole map into small patches of 10 × 10 grids without overlap. The LCM was then computed for each patch, and was stored as the base signature library. Suppose that a ROI of 20 × 20 pixels is selected as the reference scene, and the user wants to obtain scenes of the same size with similar content. We first compute the LCM for the selected reference scene. For the signatures of the scenes in the database, we combine the adjacent 2 × 2 base signatures in the library as the signature of the candidate scene. The signature of the reference scene and the combined signatures in the database are then compared by the similarity measurement to return retrieval results. A sketch of this retrieval procedure is shown in Figure 3a.
where V 1 , V 2 , V 11 , and V 12 correspond to the features of base blocks 1, 2, 11, and 12. V 1 1 omits the features of adjacent regions of each block. We now describe how to obtain the features of these regions in detail. As shown in Figure 3a, the adjacent regions can be divided into three parts: two columns C with the top left and bottom right coordinates of (1, 10) and (20,11), two rows R with the top left and bottom right coordinates of (10, 1) and (11,20), and a square S with four pixels (10, 10), (10,11), (11,10), and (11,11).
where V R , V C , and V S correspond to the LCM features of C, R, S. Then, the complete LCM feature of block 1 should be: The above example explains the principle of combining the signature in the signature library to get the new signature of the scene explicitly. Suppose that the size of the base scene is m × m pixels. Based on the base signature library, we can set the parameters of the sliding window with width, height, and offset to be k, l, and t (where k&l = m, 2 m, ...; t = m, 2 m, ..., & t ≤ min(k, l)).

Retrieval of Alike Scenes
In [28], the authors used an overlapping sliding window to search for alike scenes in the whole database. In this letter, we adopt a similar strategy. However, it can often occur that several scenes which are very close to each other are all in the retrieval results, due to the fact that the size of the retrieved scene is larger than the sliding window. In this case, we propose to merge nearby scenes when we display the retrieval results, in order to show the overall look of the retrieved scenes. More specifically, for each query, we first retrieve p similar scenes with the same size as the sliding window from the database (hereafter called "small scenes"). Among the p small scenes, scenes with overlapping areas are then merged. The merged scenes (hereafter called "larger scenes") have a larger size than the sliding window, and are not necessarily square. We attempt to retrieve six large scenes for the display of the retrieved results. If the p small scenes are not enough to form six large scenes, we retrieve more small scenes from the database. For comparison, the same operation was undertaken on the retrieval result returned by the approach proposed in [28]. In order to obtain the similarity map, we used a simple normalization operation (see Equation (14)) on the cosine similarity: where d cos is the cosine distance between the reference scene and the scenes in the database, and Sim is the similarity after normalization.

Retrieval Performance
Since there exists no ground truth data in the dataset we used, it is difficult to use the common ways of evaluating the retrieval performance-even though it could be determined whether the retrieved scenes are similar to the reference scene or not. Here, we try to evaluate the retrieval performance from two aspects: (1) the number of similar scenes appearing in the top retrievals; and (2) the ranking order of the similar scenes.

Experiments and Analysis
To demonstrate the validity of the proposed method, a number of comparison experiments were carried out. The whole database consists of a large landcover map with size 161, 190 × 104, 424 pixels. We first divided the whole map into small patches of 100 × 100 pixels without overlap. For the first three comparison experiments, two different scenes from the NLCD 2006 data set were chosen as the reference scenes. We adopted an overlapping sliding window with a size 500 × 500 pixels, where each time the window slid 100 pixels in either the horizontal or vertical direction. Since the NLCD database is of 30 m/pixel spatial resolution, each window covers 15 km × 15 km on the ground. With regard to the final experiments, we tried to retrieve the scenes containing an airport.
The first scene selected was in the region located in Kaufman Country, Texas (see Figure 4). As we can see, the first reference scene is traversed by the Cedar Creek Reservoir, which is classified as "open water". This could be the most noticeable information to determine the similarity between the reference and the retrieved scenes. Along the reservoir, there are a few classes belonging to the "developed" category. In addition to the "open water" and "developed" categories, the other two land-cover types of "pasture/hay" and "deciduous forest" occupy most of the remaining regions of this scene. The results of the retrieval experiment acquired by the use of the proposed method and the method proposed in [28] are shown in Figures 5 and 6, respectively. The scenes ranking first acquired by both methods are the same, and are in the same area where the reference scene is located. Figures 5 and 6 reveal that the ranking order of the proposed method is more reasonable than the of the method proposed in [28]. Judged only from the visual aspect, the last two scenes retrieved by the method proposed in [28] are quite different from the reference ones. The first four scenes (including the first retrieved one) retrieved by the proposed method are very similar to the reference ones. In Figures 7 and 8, the remotely sensed images correspond to the retrieved scenes shown in Figures 5 and 6. The images were acquired from Google Earth, using the geographical coordinates shown in Table 1.      The most useful scene retrieval results are the similarity maps of the whole dataset compared to the reference scene. In Figure 9, the similarity maps computed by the use of the proposed method and the method proposed in [28] are shown. Figure 10 shows the histograms of the similarity maps in Figure 9. It can be seen that although the visual aspects of the two similarity maps are different, high values of similarity measurements are always rare in both maps. In addition, the spatial distributions of both maps are correlated (i.e., the locations which have high values in Figure 9a also have high values in Figure 9b). However, according to Figures 5 and 6, the rank order of the most similar scenes obtained by the proposed method seems more reasonable, which can help the user to obtain the desired scenes from the large database more efficiently.   The second reference scene considered in our experiment covers Yucatan Lake, which is located in the northeastern part of the U.S. state of Louisiana ( Figure 11). The center of this scene is crossed by the Mississippi River, and the western part is divided into two parts by Lake Bruin-both of which are classified as "cultivated crops". Both the Mississippi River and Lake Bruin are classified as "open water". Both sides of the Mississippi River are covered with "woody wetlands", and "deciduous forest" occupies almost the whole southeastern part of the reference scene. Figures 12 and 13 show the results of this retrieval. The geographical coordinates of the retrieved scenes are presented in Table 2. The top five scenes given by the method proposed in [28] are all included in the results of the proposed method, except that the ranking order of the scenes is different, and the size of the scenes also varies. Again, the similarity maps obtained by the two methods (see Figure 14) are spatially correlated. Figure 15 shows the histograms of the similarity maps in Figure 14. However, the ranking order of the results is better when using the proposed method. The last scene obtained by the method proposed in [28] (Figure 13f) has different content.     The third reference scene we selected is located in the northeastern part of the American state of Utah ( Figure 16). The upper part of the reference scene is located within Springville, while the majority of the reference scene is located within Mapleton. Compared to the two previous scenes, the third reference scene is more heterogeneous. Seen from Figure 16, the western parts of the reference scene are urban areas, and the eastern parts are mountainous areas. The western parts are almost occupied by "developed" category with different intensities. Apart from the "developed" category, the other land-cover type "pasture/hay" occupies the remaining regions of the western parts. Within the eastern parts, the most common land-cover types are "evergreen forest", "deciduous forest", "shrub/scrub", and "barren land", which are typical landscape of the mountains. Figures 17 and 18 show the results acquired by different methods. The geographical coordinates of the retrieved scenes are presented in Table 3. The top four scenes given by the method proposed in [28] are the same as the results of our proposed method, and the ranking order of the scenes are also identical, except that the size of the scenes varies. For the last two scenes (see Figures 17 and 18) retrieved by each method, judged from the visual aspect, scenes acquired by the proposed method are much more similar than [28]. The distribution of areas with relative high similarity are identical in Figure 19a,b. So, the similarity maps obtained by the two methods (see Figure 19) are spatially correlated. Figure 20 shows the histograms of the similarity maps in Figure 19. (d) 500×700 (e) 600×600 (f) 800×1000 Figure 18. Top six retrieval results of reference III acquired by the use of the method proposed in [28]. Table 3. The geographic coordinates of the center pixels in the top six returned scenes with reference scene III. Coordinates included in the brackets were acquired by [28], and the others are the results of the proposed method.   In the last experiment, we tried to extract scenes with concrete semantics from the NLCD database. Due to the coarse resolution of the NLCD dataset (30 m), few ground objects are visible. It was observed that the airstrips in the NLCD 2006 dataset are usually represented by developed areas of medium and high intensities. We also observed that the neighbourhood regions of the airstrips are always filled with open space. Based on this information, we could try to retrieve scenes containing an airport.

Rank
A scene (see Figure 21a) containing Arkansas International Airport was selected as the reference scene. We set the size of the sliding overlapping window as 100 × 100 pixels and the step size as 100 pixels. Figures 21 and 22 show the top 12 similar scenes by each method. Apart from the similarity map ( Figure 23) and histograms of the similarity maps (Figure 24), we also returned the ranking of the similarity degree. Therefore, we could also observe each scene from the top-ranking scene to the bottom. In order to give an authentic look to the scenes, rather than classified map, the geographic coordinates of each scene were returned. This allowed us to locate the scenes in Google Earth. Among the top 12 most similar scenes, there are six scenes (see Figure 21a-c,e,i,k) containing an airport, which respectively correspond to Arkansas International Airport, Indianapolis International Airport, Monroe Regional Airport, South Bend Airport, Salina Municipal Airport, and Valley International Airport. Meanwhile, by the use of the method proposed in [28], only three airports-Arkansas International Airport, Orangeburg Municipal Airport, and Fort Wayne Municipal Airport (Figure 22a,i,k)-were returned among the top 12 scenes. Furthermore, we compared the precision of the top 96 scenes retrieved by the use of the proposed method and the method proposed in [28]. The proposed method could obtain 29 airports, while the method proposed in [28] Figure 21a) obtained by the approach proposed in [28]. The content in the brackets is the geographic coordinates of the scenes.

Discussion
We propose a CBIR-like remotely sensed scene retrieval system for use with LULC databases in this study. Motivated by GLCM, we propose a scene signature extraction method for LULC scenes. To validate our method, four reference scenes are selected to conduct the comparison experiments. For the first three reference scenes, we evaluated the performance of our method in two aspects: the content of the retrieved scenes, and the ranking order. As depicted in Figures 5, 6, 12, 13, 17 and 18 that scenes returned by the proposed method seem much more similar compared to [28]. Additionally, the rank order of the most similar scenes obtained by the proposed method seems more reasonable, which can help the user to obtain the desired scenes from the large database more efficiently. For the last reference scene, we selected a scene containing an airport. Among the top twelve most similar scenes, our method can return six scenes containing an airport, while [28] had only three. Furthermore, we compared the precision of the top 96 scenes retrieved by the use of the proposed method and the method proposed in [28]. The proposed method could obtain 29 airports, while the method proposed in [28] could only acquire 8.
To achieve a finer retrieval result, we propose the LCM as scene signature. Through the designed CBMR, the user could obtain not only the alike scenes in varied scales, but also the similarity map. Moreover, the geographical coordinates of the alike scenes are also presented. Due to this, the scenes could be easily located in Google Earth or other map tools. Per the results mentioned above in Section 3, the proposed method could achieve better performance. The retrieval result of the proposed method is more accurate and reasonable. Some future work is also expected. At first, the proposed method can be extended to ordinary remotely sensed images rather than land cover products. A rough classification of the images in order to convert the grey-valued (or multi-spectral) images into label images is required before the retrieval. In addition, we need to merge the feedback technique into our system. With relevance feedback technique involved, users would be able to indicate to the image retrieval system whether the retrieved results are "relevant", "irrelevant", or "neutral". Retrieval results would then be refined iteratively. This could help the user obtain much more accurate results. The next aspect we need to take into consideration is the complex semantic information; for example, a large open space with many cars is likely to be a parking lot.

Summary and Conclusions
In this letter, we have introduced an effective approach for retrieving alike scenes in large category-valued geospatial data sets. The LCM has been used as the scene signature which is able to not only describe the distribution of the labels in a scene, but also to extract the spatial relations of the different labels. The experimental results show the effectiveness of the proposed method. By constructing the base signature library, much time could be saved, as we do not need to recalculate the signature of the scenes. Further, our CBMR system can support reference scenes with irregular shape. In the end, benefiting from the LCM, we can compare the similarity of scenes of different size. Based on this principle, we can acquire retrieval results in varied scales, and not only the same size as reference scenes. This could increase the efficiency for the users to acquire desired scenes from a large database.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: