Urban areas are the center of human settlement with intensive anthropic activities and dense built-up infrastructures, suffering significant evolution in population shift, land-use change, industrial production, and so on. Urbanization-induced environmental pollution, climate change, and ecosystem degradation are the research hotpot that highly relates to the sustainable human future. Remote sensing (RS) imageries from different platforms (drone, airborne, and spaceborne) and different sensors (optical, thermal, SAR, and LiDAR), provide essential information for these applications in urban areas with various characteristics and spatiotemporal resolutions. Especially, the continually improved spatial resolution can satisfy the description of the complex urban geographical system, and it is applicable for monitoring numerous natural and anthropogenic issues at different scales.
This Special Issue (SI) aims to invite recent advances in the applications of RS imagery for urban areas, and 17 papers in total were selected and published. Among them, 12 papers emphasize the novel urban application algorithms based on RS imageries, such as urban attribute mapping, building extraction, classification, change detection, and so on [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12], and 5 papers directly employed RS imageries to analyze the environmental variations and urban expansion in typical cities, such as urban heat island, air pollution, lightning, and so on [
13,
14,
15,
16,
17].
RS imageries provide new opportunities to extract the urban building information and detect its changes, and thus there are four papers focused on this issue [
1,
2,
3,
4]. Cao et al. [
1] proposed a stacking ensemble deep learning model (SENet) to obtain fine-scale spatial and spectral building information, based on a sparse autoencoder integrating U-NET, SegNet, and FCN-8s models. The model was assessed by a building dataset in Hebei Province, China, and the results indicate that its accuracy is significantly improved compared to all three models. Xue et al. [
2] proposed a multi-branched network structure to fuse the semantic information of the building changes at different levels. Experimentation with the WHU Building Change Detection Dataset showed that the proposed method obtained accuracies of 0.8526, 0.9418, and 0.9204 in IoU, Recall, and F1 Score, respectively, which could assess building change areas with complete boundaries and accurate results. Luo et al. [
3] utilized GF-7 high-resolution stereo mapping satellite double-line camera images and multispectral images for the segment of building boundary, based on a multilevel features fusion network (MFFN). The results show that high accuracy of 95.29% can be achieved in building extraction. The 3D building model can be efficiently built in Level of Details 1 (LOD1) based on the extracted building vector and elevation information from the digital surface model, and the urban scene was produced for realistic 3D visualization. Chen et al. [
4] reconstructed bias U-Net with self-attention for semantic segmentation of building rooftops. Concretely, a self-attention module is added to learn the attention weights of inputs in the encoding part. The proposed method achieves IoU scores of 89.39% and 73.49% for WHU and Massachusetts datasets, respectively.
Except for building information extraction, classification, target detection, and change detection are also very important for urban applications using RS imageries, and there are five papers on these issues [
5,
6,
7,
8,
9]. As for classification, Ling et al. [
5] proposed a research framework to quantify the urban land cover (ULC) classification accuracy using optical and SAR data with various cloud levels, using three typical supervised classification methods. The experimental results indicate that the ULC classification accuracy decreases with increasing cloud content, and the fusion of SAR and optical data can significantly help reduce the confusion between land covers under clouds and improve the classification accuracy. Shi et al. [
6] proposed an attention-guided classification method (AGCNet) for multispectral and panchromatic images, based on a lightweight multi-sensor classification network. AGCNet mainly consists of a share split network (SSNet) and a selective classification network (SCNet), which are used to balance the classification performance and time cost better. The classification maps and accuracies show the superiority of the proposed AGCNet, and it can be easily extended to other multi-sensor and multi-scale classifications. As for target detection, Chen et al. [
7] proposed a Rotation-Invariant and Relation-Aware (RIRA) CDAOD network. It is trained at the image level and the prototype level based on relation aware graph to align the feature distribution and added the rotation-invariant regularizer to deal with the rotation diversity. The results show that the method can effectively improve the detection effect in the target domain, and outperforms competing methods. Shen et al. [
8] proposed an algorithm combining the constrained energy minimization (CEM) algorithm and the improved maximum between-class variance (OTSU) algorithm (t-OTSU), to obtain the initial target detection results and adaptively segment the target region. The detection accuracy is above 99%, and the false alarm rate is below 0.2%. Yang et. al. [
9] focused on the change detection in high-resolution RS imageries and proposed an MRA-SNet model based on the UNet network. The Siamese network is used to extract the features of bi-temporal images in the encoder separately and perform the difference connection to generate difference maps better. The multi-Res blocks and the residual connections are applied to extract detailed spatial and spectral features of different scales, and the Attention Gates module is added to better focus on the changing features and suppress the irrelevant features.
There are also three other papers that aimed at different demands of urban RS applications [
10,
11,
12]. Chao et al. [
10] analyzed the ability to utilize contextual features from very-high-spatial-resolution (<2 m) and medium-spatial-resolution (Sentinel-2, 10 m) imageries to model the urban attributes and population density under the human-modified landscape. The results suggest that contextual features can model urban attributes well at very high spatial resolutions, with out-of-sample R
2 values up to 93%. Feng et al. [
11] aimed at the image quality for urban analysis, and proposed a region-by-region registration algorithm that combines the feature-based and optical flow methods. Concretely, the initial displacement fields for a pair of images are calculated by the block-weighted projective model and Brox optical flow estimation, respectively, in the flat- and complex-terrain regions. The abnormal displacements resulting from the sensitivity of optical flow in the land use or land cover changes, are adaptively detected and corrected by the weighted Taylor expansion. The experimental results demonstrated that the proposed method could achieve the sub-pixel alignment accuracy of different optical RS images. Zhang et al. [
12] investigated the mechanisms of the radar return changes induced by urban flooding under different polarizations, and proposed an urban flooding index (UFI) for unsupervised inundated urban area detection. The Sentinel-1 PolSAR is used as the basic data, and the Jilin-1 high-resolution optical images acquired on the same day are used for visual interpretation as ground truth. The results indicate that the UFI-based method can achieve higher overall accuracy than the conventional unsupervised method.
For the remaining five papers analyzing the urban expansion and urban environmental changes [
13,
14,
15,
16,
17], Liu et al. [
13] used time-series Landsat imagery to map and quantify the spatiotemporal dynamics of urban expansion from 1990 to 2020 in Xiaonan District in Hubei Province, China. The built-up area and urban land are extracted in the RS images using different classification methods. It is found that the urban expansion first decreased and then increased in the last 30 years, and the development of the secondary industry is the main driving force. Shen et al. [
14] compared the spatiotemporal patterns of Surface Urban Heat Island in Wuhan and Nanchang city in China, by fusing the data from Landsat, MODIS, and AVHRR. Opposite spatiotemporal patterns are found between the two cities during 1984 and 2018, even though both of them are widely considered as the hottest cities and called “Stove cities”. Nanchang presents higher and more fluctuating surface urban heat island intensity (SUHII) than Wuhan under different definitions of SUHII. Wang et al. [
15] utilized 9-year datasets of cloud-to-ground (CG) lightning, aerosol optical depth (AOD), convective available potential energy (CAPE), and surface relative humidity (SRH) from ground-based observation and model reanalysis to analyze over three air-polluted regions of China. It is concluded that the CG lighting density is found to be higher under conditions with high sulfate and total AOD during the whole seasonal cycles over all the study regions. A slight decrease of CG lightning is found under most high dust AOD conditions. Xue et al. [
16] used a series of RS images to explore how a typical resource-based mining city, Datong, has expanded and evolved over the last two decades (2000–2018), with a reflection on the role of urban planning and development policies in driving the city spatial transformation. The results indicate that the area of urban construction land has increased by 132.6% during the study period. Wang et al. [
17] estimated and analyzed the nighttime PM
2.5 concentration based on LJ1-01 images, taking the Pearl River Delta urban agglomeration of China as an example. Based on radiative transfer theory, a correlation model of the nighttime light radiance and ground PM
2.5 concentration is established. The results indicate that the R
2 value between the model-estimated and measured values is 0.82 in the PRD region, and the model attains a high estimation accuracy.
In summary, this SI provides an enhanced understanding of applications of RS imagery for urban areas. New methodologies are presented for extracting the building information, modeling urban attributes and population, and detecting the urban flooding using remote sensing imageries [
1,
2,
3,
4,
10,
12]. Novel models for RS classification, target and change detection are also included, which can significantly support further urban applications [
5,
6,
7,
8,
9]. Different analyses on urban expansion, urban heat islands, air pollution, and lightning can advance our understanding of the interactions between urban development and the regional environment [
13,
14,
15,
16,
17]. Almost all of these researches use the satellite RS imageries from optical, thermal, or SAR sensors. It indicates that satellite RS is still the mainstream, and we should further excavate the potential of data from drones and airborne platforms. Among the 12 papers which proposed new methods, 10 used machine learning networks. It indicates the new opportunities brought by the development of machine learning technology, and it would be a promising trend to employ these new technologies and RS data to monitor the cities in which we living.