ISPRS International Journal of Geo-Information

Research

17 pages, 64925 KB

Open AccessEditor’s ChoiceArticle

Identifying Urban Wetlands through Remote Sensing Scene Classification Using Deep Learning: A Case Study of Shenzhen, China

by Renfei Yang, Fang Luo, Fu Ren, Wenli Huang, Qianyi Li, Kaixuan Du and Dingdi Yuan

ISPRS Int. J. Geo-Inf. 2022, 11(2), 131; https://doi.org/10.3390/ijgi11020131 - 14 Feb 2022

Cited by 20 | Viewed by 5767

Abstract

Urban wetlands provide cities with unique and valuable ecosystem services but are under great degradation pressure. Correctly identifying urban wetlands from remote sensing images is fundamental for developing appropriate management and protection plans. To overcome the semantic limitations of traditional pixel-level urban wetland [...] Read more.

Urban wetlands provide cities with unique and valuable ecosystem services but are under great degradation pressure. Correctly identifying urban wetlands from remote sensing images is fundamental for developing appropriate management and protection plans. To overcome the semantic limitations of traditional pixel-level urban wetland classification techniques, we proposed an urban wetland identification framework based on an advanced scene-level classification scheme. First, the Sentinel-2 high-resolution multispectral image of Shenzhen was segmented into 320 m × 320 m square patches to generate sample datasets for classification. Next, twelve typical convolutional neural network (CNN) models were transformed for the comparison experiments. Finally, the model with the best performance was used to classify the wetland scenes in Shenzhen, and pattern and composition analyses were also implemented in the classification results. We found that the DenseNet121 model performed best in classifying urban wetland scenes, with overall accuracy (OA) and kappa values reaching 0.89 and 0.86, respectively. The analysis results revealed that the wetland scene in Shenzhen is generally balanced in the east–west direction. Among the wetland scenes, coastal open waters accounted for a relatively high proportion and showed an obvious southward pattern. The remaining swamp, marsh, tidal flat, and pond areas were scattered, accounting for only 4.64% of the total area of Shenzhen. For scattered and dynamic urban wetlands, we are the first to achieve scene-level classification with satisfactory results, thus providing a clearer and easier-to-understand reference for management and protection, which is of great significance for promoting harmony between humanity and ecosystems in cities. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

22 pages, 4377 KB

Open AccessArticle

Spatiotemporal Graph Convolutional Network for Multi-Scale Traffic Forecasting

by Yi Wang and Changfeng Jing

ISPRS Int. J. Geo-Inf. 2022, 11(2), 102; https://doi.org/10.3390/ijgi11020102 - 1 Feb 2022

Cited by 29 | Viewed by 7577

Abstract

Benefiting from the rapid development of geospatial big data-related technologies, intelligent transportation systems (ITS) have become a part of people’s daily life. Traffic volume forecasting is one of the indispensable tasks in ITS. The spatiotemporal graph neural network has attracted attention from academic [...] Read more.

Benefiting from the rapid development of geospatial big data-related technologies, intelligent transportation systems (ITS) have become a part of people’s daily life. Traffic volume forecasting is one of the indispensable tasks in ITS. The spatiotemporal graph neural network has attracted attention from academic and business domains for its powerful spatiotemporal pattern capturing capability. However, the existing work focused on the overall traffic network instead of traffic nodes, and the latter can be useful in learning different patterns among nodes. Moreover, there are few works that captured fine-grained node-specific spatiotemporal feature extraction at multiple scales at the same time. To unfold the node pattern, a node embedding parameter was designed to adaptively learn nodes patterns in adjacency matrix and graph convolution layer. To address this multi-scale problem, we adopted the idea of Res2Net and designed a hierarchical temporal attention layer and hierarchical adaptive graph convolution layer. Based on the above methods, a novel model, called Temporal Residual II Graph Convolutional Network (Tres2GCN), was proposed to capture not only multi-scale spatiotemporal but also fine-grained features. Tres2GCN was validated by comparing it with 10 baseline methods using two public traffic volume datasets. The results show that our model performs good accuracy, outperforming existing methods by up to 9.4%. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

19 pages, 2404 KB

Open AccessArticle

Improving Road Surface Area Extraction via Semantic Segmentation with Conditional Generative Learning for Deep Inpainting Operations

by Calimanut-Ionut Cira, Martin Kada, Miguel-Ángel Manso-Callejo, Ramón Alcarria and Borja Bordel Sanchez

ISPRS Int. J. Geo-Inf. 2022, 11(1), 43; https://doi.org/10.3390/ijgi11010043 - 9 Jan 2022

Cited by 22 | Viewed by 4905

Abstract

The road surface area extraction task is generally carried out via semantic segmentation over remotely-sensed imagery. However, this supervised learning task is often costly as it requires remote sensing images labelled at the pixel level, and the results are not always satisfactory (presence [...] Read more.

The road surface area extraction task is generally carried out via semantic segmentation over remotely-sensed imagery. However, this supervised learning task is often costly as it requires remote sensing images labelled at the pixel level, and the results are not always satisfactory (presence of discontinuities, overlooked connection points, or isolated road segments). On the other hand, unsupervised learning does not require labelled data and can be employed for post-processing the geometries of geospatial objects extracted via semantic segmentation. In this work, we implement a conditional Generative Adversarial Network to reconstruct road geometries via deep inpainting procedures on a new dataset containing unlabelled road samples from challenging areas present in official cartographic support from Spain. The goal is to improve the initial road representations obtained with semantic segmentation models via generative learning. The performance of the model was evaluated on unseen data by conducting a metrical comparison where a maximum Intersection over Union (IoU) score improvement of 1.3% was observed when compared to the initial semantic segmentation result. Next, we evaluated the appropriateness of applying unsupervised generative learning using a qualitative perceptual validation to identify the strengths and weaknesses of the proposed method in very complex scenarios and gain a better intuition of the model’s behaviour when performing large-scale post-processing with generative learning and deep inpainting procedures and observed important improvements in the generated data. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

19 pages, 32874 KB

Open AccessArticle

Cascaded Residual Attention Enhanced Road Extraction from Remote Sensing Images

by Shengfu Li, Cheng Liao, Yulin Ding, Han Hu, Yang Jia, Min Chen, Bo Xu, Xuming Ge, Tianyang Liu and Di Wu

ISPRS Int. J. Geo-Inf. 2022, 11(1), 9; https://doi.org/10.3390/ijgi11010009 - 29 Dec 2021

Cited by 43 | Viewed by 7491

Abstract

Efficient and accurate road extraction from remote sensing imagery is important for applications related to navigation and Geographic Information System updating. Existing data-driven methods based on semantic segmentation recognize roads from images pixel by pixel, which generally uses only local spatial information and [...] Read more.

Efficient and accurate road extraction from remote sensing imagery is important for applications related to navigation and Geographic Information System updating. Existing data-driven methods based on semantic segmentation recognize roads from images pixel by pixel, which generally uses only local spatial information and causes issues of discontinuous extraction and jagged boundary recognition. To address these problems, we propose a cascaded attention-enhanced architecture to extract boundary-refined roads from remote sensing images. Our proposed architecture uses spatial attention residual blocks on multi-scale features to capture long-distance relations and introduce channel attention layers to optimize the multi-scale features fusion. Furthermore, a lightweight encoder-decoder network is connected to adaptively optimize the boundaries of the extracted roads. Our experiments showed that the proposed method outperformed existing methods and achieved state-of-the-art results on the Massachusetts dataset. In addition, our method achieved competitive results on more recent benchmark datasets, e.g., the DeepGlobe and the Huawei Cloud road extraction challenge. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

15 pages, 10255 KB

Open AccessArticle

Automatic Extraction of Indoor Spatial Information from Floor Plan Image: A Patch-Based Deep Learning Methodology Application on Large-Scale Complex Buildings

by Hyunjung Kim, Seongyong Kim and Kiyun Yu

ISPRS Int. J. Geo-Inf. 2021, 10(12), 828; https://doi.org/10.3390/ijgi10120828 - 10 Dec 2021

Cited by 24 | Viewed by 15991

Abstract

Automatic floor plan analysis has gained increased attention in recent research. However, numerous studies related to this area are mainly experiments conducted with a simplified floor plan dataset with low resolution and a small housing scale due to the suitability for a data-driven [...] Read more.

Automatic floor plan analysis has gained increased attention in recent research. However, numerous studies related to this area are mainly experiments conducted with a simplified floor plan dataset with low resolution and a small housing scale due to the suitability for a data-driven model. For practical use, it is necessary to focus more on large-scale complex buildings to utilize indoor structures, such as reconstructing multi-use buildings for indoor navigation. This study aimed to build a framework using CNN (Convolution Neural Networks) for analyzing a floor plan with various scales of complex buildings. By dividing a floor plan into a set of normalized patches, the framework enables the proposed CNN model to process varied scale or high-resolution inputs, which is a barrier for existing methods. The model detected building objects per patch and assembled them into one result by multiplying the corresponding translation matrix. Finally, the detected building objects were vectorized, considering their compatibility in 3D modeling. As a result, our framework exhibited similar performance in detection rate (87.77%) and recognition accuracy (85.53%) to that of existing studies, despite the complexity of the data used. Through our study, the practical aspects of automatic floor plan analysis can be expanded. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

24 pages, 6244 KB

Open AccessArticle

MCCRNet: A Multi-Level Change Contextual Refinement Network for Remote Sensing Image Change Detection

by Qingtian Ke and Peng Zhang

ISPRS Int. J. Geo-Inf. 2021, 10(9), 591; https://doi.org/10.3390/ijgi10090591 - 7 Sep 2021

Cited by 11 | Viewed by 3256

Abstract

Change detection based on bi-temporal remote sensing images has made significant progress in recent years, aiming to identify the changed and unchanged pixels between a registered pair of images. However, most learning-based change detection methods only utilize fused high-level features from the feature [...] Read more.

Change detection based on bi-temporal remote sensing images has made significant progress in recent years, aiming to identify the changed and unchanged pixels between a registered pair of images. However, most learning-based change detection methods only utilize fused high-level features from the feature encoder and thus miss the detailed representations that low-level feature pairs contain. Here we propose a multi-level change contextual refinement network (MCCRNet) to strengthen the multi-level change representations of feature pairs. To effectively capture the dependencies of feature pairs while avoiding fusing them, our atrous spatial pyramid cross attention (ASPCA) module introduces a crossed spatial attention module and a crossed channel attention module to emphasize the position importance and channel importance of each feature while simultaneously keeping the scale of input and output the same. This module can be plugged into any feature extraction layer of a Siamese change detection network. Furthermore, we propose a change contextual representations (CCR) module from the perspective of the relationship between the change pixels and the contextual representation, named change region contextual representations. The CCR module aims to correct changed pixels mistakenly predicted as unchanged by a class attention mechanism. Finally, we introduce an effective sample number adaptively weighted loss to solve the class-imbalanced problem of change detection datasets. On the whole, compared with other attention modules that only use fused features from the highest feature pairs, our method can capture the multi-level spatial, channel, and class context of change discrimination information. The experiments are performed with four public change detection datasets of various image resolutions. Compared to state-of-the-art methods, our MCCRNet achieved superior performance on all datasets (i.e., LEVIR, Season-Varying Change Detection Dataset, Google Data GZ, and DSIFN) with improvements of 0.47%, 0.11%, 2.62%, and 3.99%, respectively. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

21 pages, 9342 KB

Open AccessArticle

Development of a City-Scale Approach for Façade Color Measurement with Building Functional Classification Using Deep Learning and Street View Images

by Jiaxin Zhang, Tomohiro Fukuda and Nobuyoshi Yabuki

ISPRS Int. J. Geo-Inf. 2021, 10(8), 551; https://doi.org/10.3390/ijgi10080551 - 16 Aug 2021

Cited by 50 | Viewed by 8037

Abstract

Precise measuring of urban façade color is necessary for urban color planning. The existing manual methods of measuring building façade color are limited by time and labor costs and hardly carried out on a city scale. These methods also make it challenging to [...] Read more.

Precise measuring of urban façade color is necessary for urban color planning. The existing manual methods of measuring building façade color are limited by time and labor costs and hardly carried out on a city scale. These methods also make it challenging to identify the role of the building function in controlling and guiding urban color planning. This paper explores a city-scale approach to façade color measurement with building functional classification using state-of-the-art deep learning techniques and street view images. Firstly, we used semantic segmentation to extract building façades and conducted the color calibration of the photos for pre-processing the collected street view images. Then, we proposed a color chart-based façade color measurement method and a multi-label deep learning-based building classification method. Next, the field survey data were used as the ground truth to verify the accuracy of the façade color measurement and building function classification. Finally, we applied our approach to generate façade color distribution maps with the building classification for three metropolises in China, and the results proved the transferability and effectiveness of the scheme. The proposed approach can provide city managers with an overall perception of urban façade color and building function across city-scale areas in a cost-efficient way, contributing to data-driven decision making for urban analytics and planning. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

23 pages, 7722 KB

Open AccessArticle

Semantic Relation Model and Dataset for Remote Sensing Scene Understanding

by Peng Li, Dezheng Zhang, Aziguli Wulamu, Xin Liu and Peng Chen

ISPRS Int. J. Geo-Inf. 2021, 10(7), 488; https://doi.org/10.3390/ijgi10070488 - 17 Jul 2021

Cited by 19 | Viewed by 4840

Abstract

A deep understanding of our visual world is more than an isolated perception on a series of objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, the span is so large that the various [...] Read more.

A deep understanding of our visual world is more than an isolated perception on a series of objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, the span is so large that the various objects are always of different sizes and complex spatial compositions. Therefore, the recognition of semantic relations is conducive to strengthen the understanding of remote sensing scenes. In this paper, we propose a novel multi-scale semantic fusion network (MSFN). In this framework, dilated convolution is introduced into a graph convolutional network (GCN) based on an attentional mechanism to fuse and refine multi-scale semantic context, which is crucial to strengthen the cognitive ability of our model Besides, based on the mapping between visual features and semantic embeddings, we design a sparse relationship extraction module to remove meaningless connections among entities and improve the efficiency of scene graph generation. Meanwhile, to further promote the research of scene understanding in remote sensing field, this paper also proposes a remote sensing scene graph dataset (RSSGD). We carry out extensive experiments and the results show that our model significantly outperforms previous methods on scene graph generation. In addition, RSSGD effectively bridges the huge semantic gap between low-level perception and high-level cognition of remote sensing images. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

14 pages, 10692 KB

Open AccessArticle

A Cost Function for the Uncertainty of Matching Point Distribution on Image Registration

by Yuxia Bian, Meizhen Wang, Yongbin Chu, Zhihong Liu, Jun Chen, Zhiye Xia and Shuhong Fang

ISPRS Int. J. Geo-Inf. 2021, 10(7), 438; https://doi.org/10.3390/ijgi10070438 - 25 Jun 2021

Cited by 3 | Viewed by 2719

Abstract

Computing the homography matrix using the known matching points is a key step in computer vision for image registration. In practice, the number, accuracy, and distribution of the known matching points can affect the uncertainty of the homography matrix. This study mainly focuses [...] Read more.

Computing the homography matrix using the known matching points is a key step in computer vision for image registration. In practice, the number, accuracy, and distribution of the known matching points can affect the uncertainty of the homography matrix. This study mainly focuses on the effect of matching point distribution on image registration. First, horizontal dilution of precision (HDOP) is derived to measure the influence of the distribution of known points on fixed point position accuracy on the image. The quantization function, which is the average of the center points’ HDOP* of the overlapping region, is then constructed to measure the uncertainty of matching distribution. Finally, the experiments in the field of image registration are performed to verify the proposed function. We test the consistency of the relationship between the proposed function and the average of symmetric transfer errors. Consequently, the proposed function is appropriate for measuring the uncertainty of matching point distribution on image registration. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

17 pages, 3290 KB

Open AccessArticle

Quantifying the Characteristics of the Local Urban Environment through Geotagged Flickr Photographs and Image Recognition

by Meixu Chen, Dani Arribas-Bel and Alex Singleton

ISPRS Int. J. Geo-Inf. 2020, 9(4), 264; https://doi.org/10.3390/ijgi9040264 - 19 Apr 2020

Cited by 37 | Viewed by 6024

Abstract

Urban environments play a crucial role in the design, planning, and management of cities. Recently, as the urban population expands, the ways in which humans interact with their surroundings has evolved, presenting a dynamic distribution in space and time locally and frequently. Therefore, [...] Read more.

Urban environments play a crucial role in the design, planning, and management of cities. Recently, as the urban population expands, the ways in which humans interact with their surroundings has evolved, presenting a dynamic distribution in space and time locally and frequently. Therefore, how to better understand the local urban environment and differentiate varying preferences for urban areas has been a big challenge for policymakers. This study leverages geotagged Flickr photographs to quantify characteristics of varying urban areas and exploit the dynamics of areas where more people assemble. An advanced image recognition model is used to extract features from large numbers of images in Inner London within the period 2013–2015. After the integration of characteristics, a series of visualisation techniques are utilised to explore the characteristic differences and their dynamics. We find that urban areas with higher population densities cover more iconic landmarks and leisure zones, while others are more related to daily life scenes. The dynamic results demonstrate that season determines human preferences for travel modes and activity modes. Our study expands the previous literature on the integration of image recognition method and urban perception analytics and provides new insights for stakeholders, who can use these findings as vital evidence for decision making. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

16 pages, 16481 KB

Open AccessEditor’s ChoiceArticle

Quantification Method for the Uncertainty of Matching Point Distribution on 3D Reconstruction

by Yuxia Bian, Xuejun Liu, Meizhen Wang, Hongji Liu, Shuhong Fang and Liang Yu

ISPRS Int. J. Geo-Inf. 2020, 9(4), 187; https://doi.org/10.3390/ijgi9040187 - 25 Mar 2020

Cited by 6 | Viewed by 3171

Abstract

Matching points are the direct data sources of the fundamental matrix, camera parameters, and point cloud calculation. Thus, their uncertainty has a direct influence on the quality of image-based 3D reconstruction and is dependent on the number, accuracy, and distribution of the matching [...] Read more.

Matching points are the direct data sources of the fundamental matrix, camera parameters, and point cloud calculation. Thus, their uncertainty has a direct influence on the quality of image-based 3D reconstruction and is dependent on the number, accuracy, and distribution of the matching points. This study mainly focuses on the uncertainty of matching point distribution. First, horizontal dilution of precision (HDOP) is used to quantify the feature point distribution in the overlapping region of multiple images. Then, the quantization method is constructed.

\bar{H D O P^{*}}

, the average of

2 \times \arctan (H D O P \times \sqrt{\frac{n}{5}} - 1) / π

on all images, is utilized to measure the uncertainty of matching point distribution on 3D reconstruction. Finally, simulated and real scene experiments were performed to describe and verify the rationality of the proposed method. We found that the relationship between

\bar{H D O P^{*}}

and the matching point distribution in this study was consistent with that between matching point distribution and 3D reconstruction. Consequently, it may be a feasible method to predict the quality of 3D reconstruction by calculating the uncertainty of matching point distribution. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

19 pages, 6569 KB

Open AccessArticle

Classification and Segmentation of Mining Area Objects in Large-Scale Spares Lidar Point Cloud Using a Novel Rotated Density Network

by Yueguan Yan, Haixu Yan, Junting Guo and Huayang Dai

ISPRS Int. J. Geo-Inf. 2020, 9(3), 182; https://doi.org/10.3390/ijgi9030182 - 24 Mar 2020

Cited by 15 | Viewed by 4749

Abstract

The classification and segmentation of large-scale, sparse, LiDAR point cloud with deep learning are widely used in engineering survey and geoscience. The loose structure and the non-uniform point density are the two major constraints to utilize the sparse point cloud. This paper proposes [...] Read more.

The classification and segmentation of large-scale, sparse, LiDAR point cloud with deep learning are widely used in engineering survey and geoscience. The loose structure and the non-uniform point density are the two major constraints to utilize the sparse point cloud. This paper proposes a lightweight auxiliary network, called the rotated density-based network (RD-Net), and a novel point cloud preprocessing method, Grid Trajectory Box (GT-Box), to solve these problems. The combination of RD-Net and PointNet was used to achieve high-precision 3D classification and segmentation of the sparse point cloud. It emphasizes the importance of the density feature of LiDAR points for 3D object recognition of sparse point cloud. Furthermore, RD-Net plus PointCNN, PointNet, PointCNN, and RD-Net were introduced as comparisons. Public datasets were used to evaluate the performance of the proposed method. The results showed that the RD-Net could significantly improve the performance of sparse point cloud recognition for the coordinate-based network and could improve the classification accuracy to 94% and the segmentation per-accuracy to 70%. Additionally, the results concluded that point-density information has an independent spatial–local correlation and plays an essential role in the process of sparse point cloud recognition. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

20 pages, 30927 KB

Open AccessArticle

Extracting Representative Images of Tourist Attractions from Flickr by Combining an Improved Cluster Method and Multiple Deep Learning Models

by Shanshan Han, Fu Ren, Qingyun Du and Dawei Gui

ISPRS Int. J. Geo-Inf. 2020, 9(2), 81; https://doi.org/10.3390/ijgi9020081 - 31 Jan 2020

Cited by 16 | Viewed by 5543

Abstract

Extracting representative images of tourist attractions from geotagged photos is beneficial to many fields in tourist management, such as applications in touristic information systems. This task usually begins with clustering to extract tourist attractions from raw coordinates in geotagged photos. However, most existing [...] Read more.

Extracting representative images of tourist attractions from geotagged photos is beneficial to many fields in tourist management, such as applications in touristic information systems. This task usually begins with clustering to extract tourist attractions from raw coordinates in geotagged photos. However, most existing cluster methods are limited in the accuracy and granularity of the places of interest, as well as in detecting distinct tags, due to its primary consideration of spatial relationships. After clustering, the challenge still exists for the task of extracting representative images within the geotagged base image data, because of the existence of noisy photos occupied by a large area proportion of humans and unrelated objects. In this paper, we propose a framework containing an improved cluster method and multiple neural network models to extract representative images of tourist attractions. We first propose a novel time- and user-constrained density-joinable cluster method (TU-DJ-Cluster), specific to photos with similar geotags to detect place-relevant tags. Then we merge and extend the clusters according to the similarity between pairs of tag embeddings, as trained from Word2Vec. Based on the clustering result, we filter noise images with Multilayer Perceptron and a single-shot multibox detector model, and further select representative images with the deep ranking model. We select Beijing as the study area. The quantitative and qualitative analysis, as well as the questionnaire results obtained from real-life tourists, demonstrate the effectiveness of this framework. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

12 pages, 1695 KB

Open AccessEditor’s ChoiceArticle

Linguistic Landscapes on Street-Level Images

by Seong-Yun Hong

ISPRS Int. J. Geo-Inf. 2020, 9(1), 57; https://doi.org/10.3390/ijgi9010057 - 20 Jan 2020

Cited by 31 | Viewed by 11287

Abstract

Linguistic landscape research focuses on relationships between written languages in public spaces and the sociodemographic structure of a city. While a great deal of work has been done on the evaluation of linguistic landscapes in different cities, most of the studies are based [...] Read more.

Linguistic landscape research focuses on relationships between written languages in public spaces and the sociodemographic structure of a city. While a great deal of work has been done on the evaluation of linguistic landscapes in different cities, most of the studies are based on ad-hoc interpretation of data collected from fieldwork. The purpose of this paper is to develop a new methodological framework that combines computer vision and machine learning techniques for assessing the diversity of languages from street-level images. As demonstrated with an analysis of a small Chinese community in Seoul, South Korea, the proposed approach can reveal the spatiotemporal pattern of linguistic variations effectively and provide insights into the demographic composition as well as social changes in the neighborhood. Although the method presented in this work is at a conceptual stage, it has the potential to open new opportunities to conduct linguistic landscape research at a large scale and in a reproducible manner. It is also capable of yielding a more objective description of a linguistic landscape than arbitrary classification and interpretation of on-site observations. The proposed approach can be a new direction for the study of linguistic landscapes that builds upon urban analytics methodology, and it will help both geographers and sociolinguists explore and understand our society. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

16 pages, 4771 KB

Open AccessEditor’s ChoiceArticle

Identification of Salt Deposits on Seismic Images Using Deep Learning Method for Semantic Segmentation

by Aleksandar Milosavljević

ISPRS Int. J. Geo-Inf. 2020, 9(1), 24; https://doi.org/10.3390/ijgi9010024 - 1 Jan 2020

Cited by 29 | Viewed by 9214

Abstract

Several areas of Earth that are rich in oil and natural gas also have huge deposits of salt below the surface. Because of this connection, knowing precise locations of large salt deposits is extremely important to companies involved in oil and gas exploration. [...] Read more.

Several areas of Earth that are rich in oil and natural gas also have huge deposits of salt below the surface. Because of this connection, knowing precise locations of large salt deposits is extremely important to companies involved in oil and gas exploration. To locate salt bodies, professional seismic imaging is needed. These images are analyzed by human experts which leads to very subjective and highly variable renderings. To motivate automation and increase the accuracy of this process, TGS-NOPEC Geophysical Company (TGS) has sponsored a Kaggle competition that was held in the second half of 2018. The competition was very popular, gathering 3221 individuals and teams. Data for the competition included a training set of 4000 seismic image patches and corresponding segmentation masks. The test set contained 18,000 seismic image patches used for evaluation (all images are 101 × 101 pixels). Depth information of the sample location was also provided for every seismic image patch. The method presented in this paper is based on the author’s participation and it relies on training a deep convolutional neural network (CNN) for semantic segmentation. The architecture of the proposed network is inspired by the U-Net model in combination with ResNet and DenseNet architectures. To better comprehend the properties of the proposed architecture, a series of experiments were conducted applying standardized approaches using the same training framework. The results showed that the proposed architecture is comparable and, in most cases, better than these segmentation models. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

21 pages, 9471 KB

Open AccessArticle

Extracting Building Areas from Photogrammetric DSM and DOM by Automatically Selecting Training Samples from Historical DLG Data

by Siyang Chen, Yunsheng Zhang, Ke Nie, Xiaoming Li and Weixi Wang

ISPRS Int. J. Geo-Inf. 2020, 9(1), 18; https://doi.org/10.3390/ijgi9010018 - 1 Jan 2020

Cited by 8 | Viewed by 3696

Abstract

This paper presents an automatic building extraction method which utilizes a photogrammetric digital surface model (DSM) and digital orthophoto map (DOM) with the help of historical digital line graphic (DLG) data. To reduce the need for manual labeling, the initial labels were automatically [...] Read more.

This paper presents an automatic building extraction method which utilizes a photogrammetric digital surface model (DSM) and digital orthophoto map (DOM) with the help of historical digital line graphic (DLG) data. To reduce the need for manual labeling, the initial labels were automatically obtained from historical DLGs. Nonetheless, a proportion of these labels are incorrect due to changes (e.g., new constructions, demolished buildings). To select clean samples, an iterative method using random forest (RF) classifier was proposed in order to remove some possible incorrect labels. To get effective features, deep features extracted from normalized DSM (nDSM) and DOM using the pre-trained fully convolutional networks (FCN) were combined. To control the computation cost and alleviate the burden of redundancy, the principal component analysis (PCA) algorithm was applied to reduce the feature dimensions. Three data sets in two areas were employed with evaluation in two aspects. In these data sets, three DLGs with 15%, 65%, and 25% of noise were applied. The results demonstrate the proposed method could effectively select clean samples, and maintain acceptable quality of extracted results in both pixel-based and object-based evaluations. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

18 pages, 4964 KB

Open AccessEditor’s ChoiceArticle

WeatherNet: Recognising Weather and Visual Conditions from Street-Level Images Using Deep Residual Learning

by Mohamed R. Ibrahim, James Haworth and Tao Cheng

ISPRS Int. J. Geo-Inf. 2019, 8(12), 549; https://doi.org/10.3390/ijgi8120549 - 30 Nov 2019

Cited by 70 | Viewed by 8883

Abstract

Extracting information related to weather and visual conditions at a given time and space is indispensable for scene awareness, which strongly impacts our behaviours, from simply walking in a city to riding a bike, driving a car, or autonomous drive-assistance. Despite the significance [...] Read more.

Extracting information related to weather and visual conditions at a given time and space is indispensable for scene awareness, which strongly impacts our behaviours, from simply walking in a city to riding a bike, driving a car, or autonomous drive-assistance. Despite the significance of this subject, it has still not been fully addressed by the machine intelligence relying on deep learning and computer vision to detect the multi-labels of weather and visual conditions with a unified method that can be easily used in practice. What has been achieved to-date are rather sectorial models that address a limited number of labels that do not cover the wide spectrum of weather and visual conditions. Nonetheless, weather and visual conditions are often addressed individually. In this paper, we introduce a novel framework to automatically extract this information from street-level images relying on deep learning and computer vision using a unified method without any pre-defined constraints in the processed images. A pipeline of four deep convolutional neural network (CNN) models, so-called WeatherNet, is trained, relying on residual learning using ResNet50 architecture, to extract various weather and visual conditions such as dawn/dusk, day and night for time detection, glare for lighting conditions, and clear, rainy, snowy, and foggy for weather conditions. WeatherNet shows strong performance in extracting this information from user-defined images or video streams that can be used but are not limited to autonomous vehicles and drive-assistance systems, tracking behaviours, safety-related research, or even for better understanding cities through images for policy-makers. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

15 pages, 813 KB

Open AccessArticle

Dynamic Recommendation of POI Sequence Responding to Historical Trajectory

by Jianfeng Huang, Yuefeng Liu, Yue Chen and Chen Jia

ISPRS Int. J. Geo-Inf. 2019, 8(10), 433; https://doi.org/10.3390/ijgi8100433 - 30 Sep 2019

Cited by 12 | Viewed by 4844

Abstract

Point-of-Interest (POI) recommendation is attracting the increasing attention of researchers because of the rapid development of Location-based Social Networks (LBSNs) in recent years. Differing from other recommenders, who only recommend the next POI, this research focuses on the successive POI sequence recommendation. A [...] Read more.

Point-of-Interest (POI) recommendation is attracting the increasing attention of researchers because of the rapid development of Location-based Social Networks (LBSNs) in recent years. Differing from other recommenders, who only recommend the next POI, this research focuses on the successive POI sequence recommendation. A novel POI sequence recommendation framework, named Dynamic Recommendation of POI Sequence (DRPS), is proposed, which models the POI sequence recommendation as a Sequence-to-Sequence (Seq2Seq) learning task, that is, the input sequence is a historical trajectory, and the output sequence is exactly the POI sequence to be recommended. To solve this Seq2Seq problem, an effective architecture is designed based on the Deep Neural Network (DNN). Owing to the end-to-end workflow, DRPS can easily make dynamic POI sequence recommendations by allowing the input to change over time. In addition, two new metrics named Aligned Precision (AP) and Order-aware Sequence Precision (OSP) are proposed to evaluate the recommendation accuracy of a POI sequence, which considers not only the POI identity but also the visiting order. The experimental results show that the proposed method is effective for POI sequence recommendation tasks, and it significantly outperforms the baseline approaches like Additive Markov Chain, LORE and LSTM-Seq2Seq. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Graphical abstract

22 pages, 2973 KB

Open AccessEditor’s ChoiceArticle

Multi-Scale Remote Sensing Semantic Analysis Based on a Global Perspective

by Wei Cui, Dongyou Zhang, Xin He, Meng Yao, Ziwei Wang, Yuanjie Hao, Jie Li, Weijie Wu, Wenqi Cui and Jiejun Huang

ISPRS Int. J. Geo-Inf. 2019, 8(9), 417; https://doi.org/10.3390/ijgi8090417 - 17 Sep 2019

Cited by 12 | Viewed by 3830

Abstract

Remote sensing image captioning involves remote sensing objects and their spatial relationships. However, it is still difficult to determine the spatial extent of a remote sensing object and the size of a sample patch. If the patch size is too large, it will [...] Read more.

Remote sensing image captioning involves remote sensing objects and their spatial relationships. However, it is still difficult to determine the spatial extent of a remote sensing object and the size of a sample patch. If the patch size is too large, it will include too many remote sensing objects and their complex spatial relationships. This will increase the computational burden of the image captioning network and reduce its precision. If the patch size is too small, it often fails to provide enough environmental and contextual information, which makes the remote sensing object difficult to describe. To address this problem, we propose a multi-scale semantic long short-term memory network (MS-LSTM). The remote sensing images are paired into image patches with different spatial scales. First, the large-scale patches have larger sizes. We use a Visual Geometry Group (VGG) network to extract the features from the large-scale patches and input them into the improved MS-LSTM network as the semantic information, which provides a larger receptive field and more contextual semantic information for small-scale image caption so as to play the role of global perspective, thereby enabling the accurate identification of small-scale samples with the same features. Second, a small-scale patch is used to highlight remote sensing objects and simplify their spatial relations. In addition, the multi-receptive field provides perspectives from local to global. The experimental results demonstrated that compared with the original long short-term memory network (LSTM), the MS-LSTM’s Bilingual Evaluation Understudy (BLEU) has been increased by 5.6% to 0.859, thereby reflecting that the MS-LSTM has a more comprehensive receptive field, which provides more abundant semantic information and enhances the remote sensing image captions. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Graphical abstract

22 pages, 5998 KB

Open AccessArticle

Image Retrieval Based on Learning to Rank and Multiple Loss

by Lili Fan, Hongwei Zhao, Haoyu Zhao, Pingping Liu and Huangshui Hu

ISPRS Int. J. Geo-Inf. 2019, 8(9), 393; https://doi.org/10.3390/ijgi8090393 - 4 Sep 2019

Cited by 5 | Viewed by 4695

Abstract

Image retrieval applying deep convolutional features has achieved the most advanced performance in most standard benchmark tests. In image retrieval, deep metric learning (DML) plays a key role and aims to capture semantic similarity information carried by data points. However, two factors may [...] Read more.

Image retrieval applying deep convolutional features has achieved the most advanced performance in most standard benchmark tests. In image retrieval, deep metric learning (DML) plays a key role and aims to capture semantic similarity information carried by data points. However, two factors may impede the accuracy of image retrieval. First, when learning the similarity of negative examples, current methods separate negative pairs into equal distance in the embedding space. Thus, the intraclass data distribution might be missed. Second, given a query, either a fraction of data points, or all of them, are incorporated to build up the similarity structure, which makes it rather complex to calculate similarity or to choose example pairs. In this study, in order to achieve more accurate image retrieval, we proposed a method based on learning to rank and multiple loss (LRML). To address the first problem, through learning the ranking sequence, we separate the negative pairs from the query image into different distance. To tackle the second problem, we used a positive example in the gallery and negative sets from the bottom five ranked by similarity, thereby enhancing training efficiency. Our significant experimental results demonstrate that the proposed method achieves state-of-the-art performance on three widely used benchmarks. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

14 pages, 5171 KB

Open AccessArticle

Using Vehicle Synthesis Generative Adversarial Networks to Improve Vehicle Detection in Remote Sensing Images

by Kun Zheng, Mengfei Wei, Guangmin Sun, Bilal Anas and Yu Li

ISPRS Int. J. Geo-Inf. 2019, 8(9), 390; https://doi.org/10.3390/ijgi8090390 - 4 Sep 2019

Cited by 50 | Viewed by 6309

Abstract

Vehicle detection based on very high-resolution (VHR) remote sensing images is beneficial in many fields such as military surveillance, traffic control, and social/economic studies. However, intricate details about the vehicle and the surrounding background provided by VHR images require sophisticated analysis based on [...] Read more.

Vehicle detection based on very high-resolution (VHR) remote sensing images is beneficial in many fields such as military surveillance, traffic control, and social/economic studies. However, intricate details about the vehicle and the surrounding background provided by VHR images require sophisticated analysis based on massive data samples, though the number of reliable labeled training data is limited. In practice, data augmentation is often leveraged to solve this conflict. The traditional data augmentation strategy uses a combination of rotation, scaling, and flipping transformations, etc., and has limited capabilities in capturing the essence of feature distribution and proving data diversity. In this study, we propose a learning method named Vehicle Synthesis Generative Adversarial Networks (VS-GANs) to generate annotated vehicles from remote sensing images. The proposed framework has one generator and two discriminators, which try to synthesize realistic vehicles and learn the background context simultaneously. The method can quickly generate high-quality annotated vehicle data samples and greatly helps in the training of vehicle detectors. Experimental results show that the proposed framework can synthesize vehicles and their background images with variations and different levels of details. Compared with traditional data augmentation methods, the proposed method significantly improves the generalization capability of vehicle detectors. Finally, the contribution of VS-GANs to vehicle detection in VHR remote sensing images was proved in experiments conducted on UCAS-AOD and NWPU VHR-10 datasets using up-to-date target detection frameworks. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

14 pages, 3304 KB

Open AccessArticle

Using Intelligent Clustering to Implement Geometric Computation for Electoral Districting

by Ying-Che Hung and Liang-Yü Chen

ISPRS Int. J. Geo-Inf. 2019, 8(9), 369; https://doi.org/10.3390/ijgi8090369 - 23 Aug 2019

Viewed by 3370

Abstract

Traditional electoral districting is mostly carried out by artificial division. It is not only time-consuming and labor-intensive, but it is also difficult to maintain the principles of fairness and consistency. Due to specific political interests, objectivity is usually distorted and controversial in a [...] Read more.

Traditional electoral districting is mostly carried out by artificial division. It is not only time-consuming and labor-intensive, but it is also difficult to maintain the principles of fairness and consistency. Due to specific political interests, objectivity is usually distorted and controversial in a proxy-election. In order to reflect the spirit of democracy, this study uses computing technologies to automatically divide the constituency and use the concepts of “intelligent clustering” and “extreme arrangement” to conquer many shortcomings of traditional artificial division. In addition, various informational technologies are integrated to obtain the most feasible solutions within the maximum capabilities of the computing system, yet without sacrificing the global representation of the solutions. We take Changhua County, Taiwan as an example of complete electoral districting, and find better results relative to the official version, which obtained a smaller difference in the population of each constituency, more complete and symmetrical constituencies, and fewer regional controversies. Our results demonstrate that multidimensional algorithms using a geographic information system could solve many problems of block districting to make decisions based on different needs. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

24 pages, 9995 KB

Open AccessArticle

Short-Term Prediction of Bus Passenger Flow Based on a Hybrid Optimized LSTM Network

by Yong Han, Cheng Wang, Yibin Ren, Shukang Wang, Huangcheng Zheng and Ge Chen

ISPRS Int. J. Geo-Inf. 2019, 8(9), 366; https://doi.org/10.3390/ijgi8090366 - 22 Aug 2019

Cited by 47 | Viewed by 5248

Abstract

The accurate prediction of bus passenger flow is the key to public transport management and the smart city. A long short-term memory network, a deep learning method for modeling sequences, is an efficient way to capture the time dependency of passenger flow. In [...] Read more.

The accurate prediction of bus passenger flow is the key to public transport management and the smart city. A long short-term memory network, a deep learning method for modeling sequences, is an efficient way to capture the time dependency of passenger flow. In recent years, an increasing number of researchers have sought to apply the LSTM model to passenger flow prediction. However, few of them pay attention to the optimization procedure during model training. In this article, we propose a hybrid, optimized LSTM network based on Nesterov accelerated adaptive moment estimation (Nadam) and the stochastic gradient descent algorithm (SGD). This method trains the model with high efficiency and accuracy, solving the problems of inefficient training and misconvergence that exist in complex models. We employ a hybrid optimized LSTM network to predict the actual passenger flow in Qingdao, China and compare the prediction results with those obtained by non-hybrid LSTM models and conventional methods. In particular, the proposed model brings about a 4%–20% extra performance improvements compared with those of non-hybrid LSTM models. We have also tried combinations of other optimization algorithms and applications in different models, finding that optimizing LSTM by switching Nadam to SGD is the best choice. The sensitivity of the model to its parameters is also explored, which provides guidance for applying this model to bus passenger flow data modelling. The good performance of the proposed model in different temporal and spatial scales shows that it is more robust and effective, which can provide insightful support and guidance for dynamic bus scheduling and regional coordination scheduling. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

15 pages, 3539 KB

Open AccessEditor’s ChoiceArticle

Speed Estimation of Multiple Moving Objects from a Moving UAV Platform

by Debojit Biswas, Hongbo Su, Chengyi Wang and Aleksandar Stevanovic

ISPRS Int. J. Geo-Inf. 2019, 8(6), 259; https://doi.org/10.3390/ijgi8060259 - 31 May 2019

Cited by 35 | Viewed by 5742

Abstract

Speed detection of a moving object using an optical camera has always been an important subject to study in computer vision. This is one of the key components to address in many application areas, such as transportation systems, military and naval applications, and [...] Read more.

Speed detection of a moving object using an optical camera has always been an important subject to study in computer vision. This is one of the key components to address in many application areas, such as transportation systems, military and naval applications, and robotics. In this study, we implemented a speed detection system for multiple moving objects on the ground from a moving platform in the air. A detect-and-track approach is used for primary tracking of the objects. Faster R-CNN (region-based convolutional neural network) is applied to detect the objects, and a discriminative correlation filter with CSRT (channel and spatial reliability tracking) is used for tracking. Feature-based image alignment (FBIA) is done for each frame to get the proper object location. In addition, SSIM (structural similarity index measurement) is performed to check how similar the current frame is with respect to the object detection frame. This measurement is necessary because the platform is moving, and new objects may be captured in a new frame. We achieved a speed accuracy of 96.80% with our framework with respect to the real speed of the objects. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Deep Learning and Computer Vision for GeoInformation Sciences

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (24 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI