Generating High-Quality and High-Resolution Seamless Satellite Imagery for Large-Scale Urban Regions

Urban geographical maps are important to urban planning, urban construction, land-use studies, disaster control and relief, touring and sightseeing, and so on. Satellite remote sensing images are the most important data source for urban geographical maps. However, for optical satellite remote sensing images with high spatial resolution, certain inevitable factors, including cloud, haze, and cloud shadow, severely degrade the image quality. Moreover, the geometrical and radiometric differences amongst multiple high-spatial-resolution images are difficult to eliminate. In this study, we propose a robust and efficient procedure for generating high-resolution and highquality seamless satellite imagery for large-scale urban regions. This procedure consists of image registration, cloud detection, thin/thick cloud removal, pansharpening, and mosaicking processes. Methodologically, a spatially adaptive method considering the variation of atmospheric scattering, and a stepwise replacement method based on local moment matching are proposed for removing thin and thick clouds, respectively. The effectiveness is demonstrated by a successful case of generating a 0.91-m-resolution image of the main city zone in Nanning, Guangxi Zhuang Autonomous Region, China, using images obtained from the Chinese Beijing-2 and Gaofen-2 highresolution satellites.


Introduction
The generation of a high-quality seamless urban geographical map is significant for urban landuse mapping and urban land planning [1][2][3][4]. Generally, high-spatial-resolution images (HRIs) from satellite remote sensing platforms are required for precise urban mapping. In recent years, small satellites, as a new generation of satellites, have received extensive attention and developed rapidly because of their light weight, small volume, low cost, and flexible launch capability. In addition, small satellites can be further netted as distributed constellations which facilitate the acquisition of highspatial-resolution remote sensing images with a high imaging quality and short revisit cycle. However, several issues must be considered during urban image generation with small satellite HRIs, such as Beijing-2, as shown in Figure 1. HRIs can be divided into the issues coming from an individual image and the issues arising amongst multiple images.
On the one hand, a single optical satellite image will inevitably be affected by a high proportion of haze and cloud cover [5][6][7]. Thus, dehazing and cloud removal should be utilized to obtain a clear and spatially seamless image. These issues come from individual images that make generating a highquality seamless urban image a challenging task. On the other hand, for imagery containing multispectral and panchromatic bands, the spatial resolution of the multispectral image and the spectral resolution of the panchromatic image can be enhanced by fusing them. Thus, pansharpening [8][9][10] is also required. Furthermore, the swath width of images is usually reduced with an increase in spatial resolution. Thus, image mosaicking is an essential step for large-scale urban image generation [11][12][13] but is accompanied by new problems. Geometric misalignment and radiometric incoherency in adjacent images can cause seam lines in the image mosaicking process. Hence, image registration and radiometric normalization must be conducted [14][15][16], which also brings new challenges to the procedure of urban image generation. These issues are currently tackled independently and inefficiently. Thus, suitable images are often unavailable to be merged into an urban image. The lack of a systematic solution and the requirements of urban mapping are the motivations for developing an integrated procedure to address the abovementioned issues. In this study, a procedure for high-quality seamless urban image generation, integrating image registration, dehazing and thin cloud removal, thick cloud detection and removal, pansharpening, and image mosaicking is proposed. Specifically, our attention is focused on cloud removal, including a novel spatially adaptive method for removing thin clouds with non-uniform atmospheric light and a stepwise local moment matching (SLMM) method for removing thick clouds. The case study of the main city zone of Nanning, Guangxi Zhuang Autonomous Region, China, shows the generated results of the procedure. The experimental results confirm that the proposed procedure is a promising and effective way for generating large-scale highquality seamless images of urban areas.
The remainder of this paper is organized as follows. Section 2 discusses the related works. Section 3 describes the proposed procedure and method for generating a high-quality seamless urban methods. The area-based methods fully utilize the intensity information, and the emphasis is placed on the similarity metric construction, such as correlation coefficients (CC), mutual information (MI) [57], and normalized correlation coefficients (NCC) [58]. Feature-based methods use distinct features, such as points, lines, and regions, to represent the image to be registered. The most representative feature extraction algorithms, such as scale-invariant feature transform (SIFT) [59] and speeded-up robust features (SURF) [60], are mostly utilized in feature-based frameworks. In addition, many excellent algorithms for image mosaicking exist, which generally consists of radiometric normalization [61][62][63][64][65], seam line detection [66][67][68] and image blending [14,[69][70][71]. However, image registration, radiometric normalization, and seam line detection are still challenging tasks for the mosaicking of high-resolution remote sensing images.
Image registration, radiometric calibration, and image blending are fundamental steps for addressing the issues of geometric misalignment, radiometric incoherency, and large-scale maps, respectively. The objective is to generate a large-scale image that involves multiple scenes. However, the mosaicking of HRIs is more difficult than that of lower-resolution images. Achieving accurate image registration, which is essential for image mosaicking, is difficult due to the limited imaging angle and land-cover changes. The determination of the seam lines for the images to be mosaicked is also important for seamless mosaicking and is often a time-consuming process. Furthermore, the other issues shown in Figure 1, including haze and thin cloud, thick cloud, cloud shadow, and pansharpening, are not all covered in the existing literature. Most of the studies have addressed these issues separately. For example, cloud cover is usually considered when mosaicking remote sensing images. On the one hand, clouds are often handled independently by scholars in terms of cloud detection and removal. Roy et al. [72] utilized cloud masking to generate mosaics of the conterminous United States. The generation of sea ice maps is further dependent upon cloud removal in the compositing process [73,74]. On the other hand, certain other issues have also been addressed, in addition to clouds. Helmer and Ruefenacht [12] used regression tree prediction and histogram matching for seamless mosaicking to support change detection in persistently cloudy regions of Landsat images. Hewson et al. [75] reduced the effects of atmospheric and surface scattering through atmospheric water vapor correction and cloud/shadow masking for the seamless geological map generation of the Broken Hill-Curnamona province of Australia, using ASTER imagery. Zhang et al. [76] considered cloud cover and geometric correction in the workflow to produce digital orthophoto maps over a national scale. Li et al. [77] applied cloud masking and image pansharpening techniques to produce a cloud shadow-free and high-resolution mosaic based on co-registered IKONOS images. In addition, attention has been paid to the radiometric differences. Du et al. [78] combined radiometric normalization with pixel-wise compositing and quality control in the process of mosaicking high-resolution satellite images. Radiometric balancing has also been applied in the image mosaic generation of ASTER thermal infrared data [79]. In summary, a comprehensive and effective procedure is required to generate a high-quality seamless and large-scale urban image from high-spatial-resolution satellite images.

Method
In this study, a procedure is proposed for high-quality seamless urban image generation from high-resolution satellite images. Methodologically, a spatially adaptive method considering the uneven atmospheric scattering and a novel algorithm known as SLMM are correspondingly proposed to remove the thin and thick clouds of the images, respectively. Figure 2 shows the framework of the proposed procedure. The images are initially pre-processed by systematic geometric correction and then classified into target and auxiliary images. The target images are determined as images that are close to the target date and have good quality, whereas the auxiliary images are selected on the basis of the cloud cover of the target images. The step of dehazing and thin cloud removal is then conducted for the target images covered by haze and thin clouds. Subsequently, the cloud-contaminated target images are transferred for thick cloud removal on the basis of the results of thick cloud detection. The cloud-free multispectral and panchromatic images are combined for pansharpening and mosaicking after overlapping image registration. Finally, the high-quality seamless urban image of the target area can be obtained by the post-processing steps, including target area clipping and radiometric adjustment.

Pre-Processing
In the pre-processing steps, all available remote sensing images are initially resampled to the same resolutions for multispectral and panchromatic images after systematic geometric correction. Subsequently, all images are classified into target and auxiliary images. The target images are close to the target date (e.g., yearly/seasonally/monthly period depending on available images and users' purposes after considering possible land cover changes due to vegetation phenology and urban expansion, etc.) and have a good quality in the target area, whereas the auxiliary images are selected on the basis of the cloud cover of the target images. The cloud cover percentages can be derived from the image header files or estimated by the existing cloud detection methods. However, considering the low accuracy of the provided cloud cover percentages or masks, a more accurate cloud detection method should thus be utilized. Finally, image registration is applied to eliminate possible misalignments between the multispectral and panchromatic images, and cloudy regions in the target and reference images, respectively. The area-based method has a weak capability to handle large geometrical deformation and is time-consuming. Thus, a feature-based method is adopted to register the images, implemented by the steps shown in Figure 3, in which sensed images denote the images to be registered. Following the image registration process, the reference and sensed images are input into ENVI. Most of the distinct feature points are extracted and matched with the corresponding points in the sensed image via the template matching algorithm. However, considering that the ranges of the images to be registered are large, and that a global transformation model will be estimated, the extracted feature points should be distributed uniformly and sparsely. Such feature points can be achieved by combining necessary manual adjustments. The feature points are empirically extracted in the building corners, road intersections and so on to supplement the feature points locally. With the satisfactory feature points, the geometrical relationship between the reference and sensed images is calculated by a linear polynomial model. The coordinates in the sensed image are then transformed and resampled by a bilinear interpolation algorithm. The same process is also used to register the corresponding target and auxiliary images and the corresponding panchromatic and multispectral images to alleviate the geometrical displacement.

Dehazing and Thin Cloud Removal
In this study, a spatially adaptive haze removal method is proposed on the basis of the DCP [21] and hazy image model [80][81][82]. The variation of atmospheric scattering, which is effective in removing uneven haze and producing a vivid result, is considered. The DCP is a statistical law that states that the intensity of certain pixels in local patches tends to be zero in at least one or several channels for the non-sky regions of clear images. The hazy image model describes the radiance composition of images under a turbid atmosphere, which can be expressed as [21] where is the hazy/thin cloud image, denotes the clear image, is the transmission, and is the atmospheric light. Thus, we can remove the haze/thin cloud via the hazy image model if and are available. Figure 4 depicts the flowchart of the haze/thin cloud removal procedure. In theory, the portion of the path radiance that is scattered and reaches the sensor will correspond to atmospheric light A in the hazy image model. However, the atmospheric light is usually assumed to be a constant in the entire scene. This assumption is true for an image covered with uniform haze, such as most natural images. The reason is that the scattering intensities of these images are relatively stable, and the scattered path radiance reaching the sensor will be almost the same in the entire scene. However, remote sensing images have particularly broad imaging range and are often covered with uneven haze. For such images, the scattered path radiance will vary in the spatial dimension, suggesting that the atmospheric light should be spatially varied. Therefore, developing an estimation algorithm that reflects the spatial characteristic of atmospheric light is necessary.
Changes in scattering intensity generally cause uneven luminance in the image, indicating that the luminance can reflect the scattering intensity distribution in space. Therefore, the luminance is an important clue for estimating non-uniform atmospheric light. On the basis of the above analysis, we propose a luminance-based atmospheric light estimation strategy, entitles the non-uniform atmospheric light model. This model can be defined as follows: where is the non-uniform atmospheric light and and ∆ are the basic atmospheric light and atmospheric light increment, respectively. The former is a constant and the latter is a variable related to the haze heterogeneity.
The non-uniform atmospheric light of an arbitrary image is given by the following two steps: (1) calculating the atmospheric light increment from the luminance fieldw and (2) estimating the basic atmospheric light from the original image. Firstly, a Gaussian low-pass filter is selected to obtain the luminance from the hazy image. As the atmospheric light is locally stable, a minimum filter is adopted to post-process the luminance. Hence, the increment can be expressed as where ̃ is the luminance of the image, ̃ is a minimum filter with window , in which the window size is related to the heterogeneity of the haze, ̃ is the minimum of ̃, and ∆ is the estimated atmospheric light increment.
To calculate the basic atmospheric light , the 0.1% pixels with the largest dark channel intensity, which are calculated by the traditional DCP, are initially selected. Moreover, one of the pixels with the largest intensity is regarded as the value of basic atmospheric light. Once and ∆ are obtained, the non-uniform atmospheric light of the input image can be estimated via Equation (2). With , the transmission of the image can be calculated and the haze/thin cloud-free image can finally be acquired. In addition, for the panchromatic channel, and are upsampled to the same size to remove haze/thin cloud.

Thick Cloud Detection and Removal
For the thick cloud of the high-resolution images, a cloud detection method and a cloud removal method are introduced in the following content.

Thick Cloud Detection
The multi-feature combined (MFC) method proposed by Li et al. [33] is used to screen cloud and cloud shadow in the haze/thin cloud-free images on the basis of the results of the dehazing and thin cloud removal. Figure 5a shows the process flow of the MFC algorithm. The MFC method combines multiple features (i.e., spectral, shape and texture) to implement object-based cloud and shadow detection. MFC initializes a rough cloud mask on the basis of spectral threshold segmentation, and the core cloud regions are captured after this step. A fine mask is then produced by applying guided filtering and binarization processing, in which the thin clouds around the cloud boundaries are captured. Finally, the non-cloud bright objects in the refined cloud mask are removed by using objectbased shape and texture features to improve the cloud detection results and produce the final cloud mask. Furthermore, a cloud shadow mask is obtained on the basis of the cloud mask, which is screened by associating clouds and the detected shadows through object-based matching and correction, with the aid of satellite viewing and solar angles. Notably, a higher priority is set for cloud than cloud shadow in the final mask.
The MFC method was originally proposed for Gaofen-1 wide-field-of-view imagery, which contains three visible bands and a near-infrared band. It can also be applied to other types of imagery that have similar band settings. In this study, MFC is used to produce cloud and cloud shadow masks for the experimental images after parameter optimization. Several key spectral thresholds in MFC are manually fine-tuned through a series of tests and parameter-sensitive analyses. Manual correction is required in a few cases because cloud and cloud shadow detection for high-resolution images is challenging, and the results produced by MFC are not completely accurate. Subsequently, the corrected masks are used as guidance for the subsequent cloud and cloud shadow removal.

Thick Cloud Removal
In this study, a simple but effective cloud removal method based on SLMM is proposed to recover the cloud-contaminated areas. The proposed SLMM cloud removal method has fidelity advantages for the spectral and spatial details in the recovered cloud-contaminated regions. Figure  5b illustrates the flowchart of the SLMM method. The inputs of the method are the target image that should be recovered, the cloud and cloud shadow mask of the target image, and the auxiliary image, which is another temporal image covering the same area as the target image. Assuming that the auxiliary image is cloud-free and has been registered with the target image, our goal is to recover the cloud-contaminated areas in the target image by combining the complementary information from the auxiliary image whilst ensuring that the recovered areas are seamless.
The proposed SLMM method consists of three main steps. Initially, connected component labeling is conducted, in which the pixels marked as cloud or cloud shadow in the target mask and connected in eight neighbours are labeled as independent objects. Then, as shown in Figure 6, each target image patch located around an object can be acquired by extending the height and width of the object's bounding rectangle (black rectangle in Figure 6a) with specific pixels (e.g., 200) in all four sides. For each patch T, the SLMM procedure is implemented to recover the contaminated areas pixel by pixel through stepwise replacement. Notably, the contaminated areas to be recovered are labeled in the target mask and recovered along the cloud object boundaries to the center. Such a process can be controlled by stepwise one-pixel erosion of the target mask, in which recovered pixels are treated as the known cloud-free pixels and involved in the recovery of the remaining contaminated pixels labeled in the target mask. Specifically, the recovered cloud-contaminated pixel ( , ) at ( , ) in the target image patch can be acquired by the linear transformation of pixel ( , ) in the auxiliary image, as shown in Equation (4). Such pixel involves valid and cloud-free pixels in local window centred at ( , ) in the target and auxiliary image patches. The linear transformation in Equation (4) is also called local moment matching, where σ and σ are the standard deviation of the valid pixels in in the target and auxiliary image patches, respectively. Moreover, and are the corresponding mean values. Notably, the size of window k is equal to 2 + 1, where is empirically set to 80 in the experiments of this study. Finally, boundary smoothing is conducted at one-pixel outer and inner edges of the recovered areas to eliminate edge effects. Consequently, the Gaussian smoothing kernel is empirically set with a size of 3 × 3 and a standard deviation of 1.6. The contaminated areas in the target image are recovered object by object and channel by channel.
The stepwise strategy progressively recovers the contaminated pixels from object boundaries to the center and treats the recovered pixels as cloud-free pixels. In addition, this strategy makes the proposed cloud removal method effective for large-scale thick cloud removal, specifically in highresolution images. The spatial details from the auxiliary image can be well preserved in the recovered areas because the SLMM method is a type of intensity adjustment-based method.

Pansharpening
The fast pansharpening algorithm used in this study is based on the CS-based fusion framework and fully utilizes the high efficiency of the CS-based method [10,83]. Figure 7 depicts the flowchart of the proposed pansharpening method, which can be represented as: where = 1,⋅⋅⋅, , is the fused high-spatial-resolution multispectral (HR-MS) image and � is the low spatial resolution multispectral (LR-MS) image upsampled to the scale of the panchromatic (PAN) image.
is the band number of the multispectral image, and the subscript indicates the th spectral band. is the intensity component of � , which is calculated by a linear combination of spectral bands in � , where the linear combination coefficient k w is obtained by Equation (6).
represents the PAN image. The histogram match is performed between and to make and I have the same mean and variance values, which can reduce the spectral distortion [83] caused by the gap between the two data levels.
is the band-dependent injection weight, which is calculated by the average gradient of each spectral band in the LR-MS image, as shown in Equation (7).
where l P is the PAN image downsampled to the size of the LR-MS image, that is, k MS in Equation (6). The linear combination coefficient k w is obtained on the basis of least square fitting.
where ( ) gra ⋅ is the function for obtaining the average gradient of the image. The average gradient is the mean value of the gradients of all pixels in the image. The used pansharpening method includes three main parts: (1) spectral band combination, (2) spatial structure information extraction, and (3) spatial information injection. Initially, the intensity component is calculated by a linear combination of the spectral bands of the upsampled multispectral image. The high-frequency spatial details that only exist in the panchromatic image are then extracted by subtracting the intensity component from the histogram-matched PAN image. Finally, the extracted spatial structure information is injected into the multispectral image by the band-dependent injection weight.

Mosaicking
The images are registered in the pre-processing step; thus, registration is not conducted here. The radiometric differences of the neighbouring images should be reduced to make the mosaicked images as similar as possible visually. Radiometric normalization is selectively implemented because additional subtle radiometric adjustments will be implemented later. Seam line detection is helpful to obtain a boundary line where small radiation differences between the images are to be mosaicked. Around the seam line, image blending is operated, which aims to achieve a uniform transition on both sides of the seam line. However, in this section, emphasis is put on image blending, as the seam line detection can be automatically conducted in ENVI, or the image edge can be utilized as a seam line.
The radius for the image blending is then set on the basis of the overlapped area of the neighboring images. For image blending, as shown in Figure 8, the following equation is used to obtain the natural mosaicked images: where is the mosaicked image and and are the left and right images to be mosaicked, respectively. and are the weights of the left and right images, respectively. ( , ) is the coordinate of the mosaicked image. Furthermore, if we set the feathering distance as , then the weight can be calculated as follows: where ( , ) and ( , ) are the coordinates of the left and right feathering edges. In this way, the grey value in the feathering distance gradually transitions from the left to the right image, thereby eliminating the seam line.

Post-Processing
In the post-processing steps, the main city area is clipped from the mosaicked image. In addition, the image is linearly stretched to ensure that the clipped urban image has good visual effects and to trim extreme values from both ends of the histogram using a specified percentage. Thus, the image contrast is improved for the final high-quality seamless urban image generation.

Urban Map Generation Experiments and Results
The experimental images consisted of two types of Chinese high-resolution satellite images, namely Beijing-2 and Gaofen-2. The Beijing-2 images are delivered by the commercial TripleSat constellation, which is composed of three high-resolution satellites launched in 2014. Beijing-2 offers 1-m panchromatic and 4-m multispectral ground sampling distance imagery with a swath width of 23 km. Gaofen-2 is one of a series of high-resolution optical Earth observation satellites launched by the China National Space Administration. Moreover, Gaofen-2 is configured with two identical cameras and has been operational since October 2015. The two cameras on Gaofen-2 are capable of collecting images with 0.81-m panchromatic and 3.24-m multispectral bands, on a combined swath width of 45 km. The multispectral images of Beijing-2 and Gaofen-2 share a similar spectral band setting and spatial resolution, and both have three visible channels and a near-infrared channel. The Beijing-2 and Gaofen-2 images have been extensively used to provide Earth observation information for applications, such as monitoring land and water resources, agriculture, urban development, and environmental impact assessment.
The experimental images covered the main city areas of Nanning in Guangxi Zhuang Autonomous Region, China. Nanning is the capital city and the largest city of the Guangxi region, which is located in the southwest of China. All available images were classified into two categories, namely, the target images used to generate the final urban image of Nanning City and the auxiliary images used to remove cloud in the target images. Table 1 shows the selected experimental images, in which the location, date and cloud cover percentages are provided, and the target and auxiliary images are numbered as T1-T5 and A1-A2, respectively. Specifically, the five Beijing-2 images acquired in October and December 2017 were considered as the target images to composite the main urban area of Nanning. Moreover, a Beijing-2 image acquired in May 2017 and a Gaofen-2 image acquired in October 2016 were used as the auxiliary images. All experimental images were level-1 data, that is, raw digital products with the process of homogenized radiation calibration. Figure 9 shows the distribution of the target images, in which the five Beijing-2 images cover the entire main city area of Nanning, and the yellow boundary is the G7201 Nanning Ring Expressway. The adjacent images are mismatched at the borders and have evident radiometric inconsistencies. All four images are contaminated by haze and cloud, to different degrees. * The actual percentages of cloud cover are slightly higher than the given percentages, which were derived from the image header files. Following the above procedure, a 0.91-m high-quality seamless main urban image of Nanning city was obtained. Generating this image took approximately 18 hours, which was implemented on a computer with an Intel Core i7-8700K 3.70 GHz CPU in the Windows 10 environment with 32 GB RAM. Notably, the necessary manual processing, which benefits the accuracy improvement and mainly includes manual interaction and mask correction, takes approximately 10% of the total processing time. In this subsection, the final experimental and intermediate results are shown to demonstrate the effectiveness and necessity of each step in the proposed procedure. Figure 10 shows the generated urban image of Nanning City. The urban image generated by the proposed procedure is of high quality (cloud-free and clear), seamless (spatially continuous), and of high resolution (0.91m resolution). In detail, no clouds can be found, and even by a careful check of the entire urban image, the buildings and roads are clear, as well as the open spaces, including the park, sports field and leisure square. The seam lines are eliminated well and cannot be visually identified. Thus, the generated image is spatially seamless. The pansharpened image has a high spatial resolution and keeps the abundant spectral information. The color of the water body of the Yongjiang River, which flows across the main city area of Nanning, is uniform. Figure 10. Generated a large-scale high-quality seamless urban image of Nanning City.
Examples are given by comparing the results before and after the specific processing, in terms of dehazing and thin cloud removal, thick cloud and shadow removal, image pansharpening and image mosaicking. The objective is to illustrate the local details of the generated image.
Haze and thin clouds are common in optical satellite images and result in information being partially missing in the image. A method of dehazing and thin cloud removal is utilized to recover the information in haze-covered areas. Figure 11 shows that the hazy images are enhanced, and the haze distributed over the urban and vegetated areas is removed. Moreover, the haze-free areas are unaffected and the details are well kept. However, the haze removal method cannot handle thick clouds and may cause residual cloud. Thus, thick cloud removal is subsequently undertaken. Differing from haze or thin clouds, thick clouds or shadows in images usually lead to information that is totally missing. In this case, we reconstruct the missing areas in the cloudcontaminated images with the aid of the auxiliary images of the same area, and thick cloud detection and cloud removal are utilized for reconstruction. Figure 12 displays the examples of cloud and shadow removal, in which cloud-contaminated images distributed over urban and vegetated areas are used. The cloud-removed images seem natural and have good color consistencies. In addition, the spatial details of the missing areas are well recovered, even where the missing areas are large, which indicates the effectiveness of the proposed cloud and shadow removal method. Image pansharpening is essential for spatial resolution improvement, and it involves fusing the abundant spectral information in the multispectral image and the spatial details in the panchromatic image. Figure 13 shows the examples of the image pansharpening undertaken in this study, which show evident spatial detail improvements. In the first example, the cars on the bridge can be seen, and the outlines of the ships on the river are clear after pansharpening. The second and third examples are of a train station and circular interchange, respectively, in which the spatial details are clearly enhanced. The last example is a vegetated area, in which the colors of the trees and water are well kept from the original multispectral images. These examples demonstrate that the pansharpening method is an effective way to improve the spatial resolution of multispectral images. Mosaicking the neighboring images is a key step for the generation of the final urban image. In our implementation, the overlapping areas of the neighboring images are first registered. The order of mosaicking the images is then determined by the image date and the percentage of valid coverage, which means that an image which is closer to the target date and has a high coverage percentage of the target area has a higher priority. Finally, the images are mosaicked as described in Section 3.5. Figure 14 shows examples of image mosaicking in complex urban areas and a vegetated area. The misalignments between neighboring images are obvious, as well as the color differences, while the mosaicked images are seamless (spatially continuous) and have good color consistency. The results confirm that the applied image registration and mosaicking methods are effective.

Comparisons of Dehazing and Thin Cloud Removal Methods
To validate the effectiveness of the haze removal method, two existing methods were selected for comparison, including the classical DCP method (CDCP) [21] and the dark channel-saturation prior method (DCSP) [84]. For a fair comparison, all parameters in these methods were set to be optimal via iterative adjustment. Figure 15 shows the haze removal result for an urban area using different methods. Clearly, most of the image is covered by haze, making the brightness of the ground features higher than the actual values. The haze effects were largely eliminated, but certain residual haze can still be seen in the top left corner as indicated by the CDCP's result. Compared with CDCP, the haze removal capability of DCSP was weaker and the haze was still significant in the result. Figure 15d depicts the result of the proposed method. The high brightness caused by haze is completely eliminated, and the ground details were recovered well. Thus, the proposed method shows better performance than the comparison methods in removing haze for urban regions. Figure 16 shows a haze removal instance for a forest area. The haze in the image was scattered and the intensity varies in spatial. Similar to Figure 15, CDCP removes the light haze but the dense one still remains. By contrast, DCSP's result is worse, where the haze cannot be removed but the spectral of clear regions altered. The result of the proposed method indicates that the uneven haze contamination completely vanished, whereas the spectral of clear regions remains. The reason is that the proposed method fully considers the scattering variation spatially, and the non-uniform atmospheric model accurately describes the change. In summary, the proposed method holds superior capability in removing uneven haze and maintaining the spectral of clear regions, and benefits for generating high-quality and high-resolution seamless satellite imagery for large-scale urban regions.

Comparisons Thick Cloud Removal Methods
In order to quantitatively evaluate the performance of the proposed SLMM method in cloud removal for high-resolution images, we simulated cloud-contaminated images by adding thick clouds to cloud-free Beijing-2 images used in this study, and measured the differences between the cloud removal results of SLMM and ground truths (i.e., original cloud-free images). Thick cloud removal methods, including localized linear histogram match (LLHM) [85] and modified neighborhood similar pixel interpolator (MNSPI) [86], are used for method comparisons. Correlation coefficient (CC), root-mean-square error (RMSE), universal image quality index (UIQI), and structural similarity (SSIM) index are utilized as the quantitative metrics for the accuracy evaluation of different methods.
We conduct two groups of simulated experiments, in which simulated thick clouds are added to vegetation and urban areas. Figure 17 shows the cloud removal results, and Table 2 lists their quantitative evaluation results. Accuracies are calculated only on the basis of the cloud removed areas and are the averaged values of four multispectral bands in Beijing-2 images. It can be seen that the proposed SLMM method achieves the best visual as well as quantitative results amongst the compared methods. Specifically, color distortion exists in the cloud removal results of LLHM (regions marked in red in Figure 17). By contrast, the results of MNSPI are influenced by the produced noises and artefacts, which lead to the loss of spatial details (regions marked in yellow in Figure 17). Thus, LLHM and MNSPI achieve lower CC and SSIM than SLMM. The cloud removal results of SLMM well preserve the spatial details and appear visually seamless in the recovered areas. Benefiting from the efficient implementation, SLMM is more effective than the other methods and takes 1.95 seconds to recover the two cloud-contaminated images in Figure 17 in total. By contrast, LLHM and MNSPI take 11.20 seconds and 166.31 seconds to accomplish the same tasks, respectively.
Although SLMM acquired the best cloud removal result, notably, the recovered spatial details in cloud removal results of SLMM are linearly transformed from the auxiliary image. Thus, the recovery accuracy will be influenced by the land cover changes between the cloud-contaminated and auxiliary images. Considering that SLMM is used to support the generation of yearly/seasonally urban geographical maps, the errors bringing from land cover changes between multitemporal images can be accepted to a certain degree.

Discussion
In the proposed procedure, a series of steps are utilized to achieve improvements in image quality and spatial resolution. However, errors are inevitable in each step; thus, the error sources and their influences were analyzed. The limitations of the applied methods were also discussed.
Image registration is a fundamental step that is aimed at eliminating the geometric misalignment and is of critical importance for multi-temporal thick cloud and shadow removal, as well as image mosaicking. Minor misalignment may remain between local regions of two temporal images, which may result in a slight spatial discontinuity in the boundaries of the cloud-removed areas and the seams of the images to be mosaicked.
In terms of dehazing and thin cloud removal, we only conduct this step on the hazy images. As the haze removal method is based on the concept of dark pixel, the approach will be efficient when the dark pixel exists and is sufficient in the study regions. However, when the dark pixel is lacking, for example, the bright soils, the method will be invalid and may cause color distortion (see Figure  18). Moreover, the method estimates the haze intensity with a slide overlapping window, assuming that the haze is uniform in local regions. Thus, for the haze contamination which varies violently in spatial, the over-or underestimated intensity will decrease the accuracy of the method, in which radiometric difference may occur in bright surface covered areas, but the radiometric normalization in the mosaicking reduces this kind of difference. For thick-cloud detection and removal, manually corrections of cloud and clouds shadow masks are necessary in certain cases to ensure the complete recovery of contaminated areas. We instead use the similar pixel regression-based method of SLMM to preserve spatial details in the cloud-removed image, which are transformed from the auxiliary image, without changing the land-cover types. In this case, the abrupt temporal changes of land cover between the areas of the target cloudy image and auxiliary image may lead to spatial discontinuity around the boundaries of the cloud-removed areas.
The pansharpening method used in this study is effective and suitable for fast fusion large-scale remote sensing images. However, local spectral distortion may occur in large-scale image fusion, as shown in Figure 19a,b. Evident spectral inconsistencies exist between the observed LR-MS image (Figure 19a) and the fused HR-MS image (Figure 19b). Such inconsistencies may be due to the difference between the overall brightness and the local brightness of large-scale images and can be solved by residual correction. However, Figure 19a,b show that, in most cases, the local area with spectral distortion seems natural and consistent with human visual perception. Thus, no strategy has been applied. Furthermore, as an intermediate step in the entire process, the accuracy of pansharpening can be affected by the previous steps. Figure 19c,d present an example of low fusion accuracy. Figure 19c exhibits the blurry fusion result caused by low registration accuracy, and Figure  19d shows a clear fusion result after improving the registration accuracy.
Given that the experimental images may be different in other applications, the above errors are dependent on the actual cases of used images and applied methods. Our experience suggests that simple but effective methods are important for the processing of large-area and high-resolution satellite images.

Conclusions
Degradation factors, including haze, cloud, and cloud shadow, reduce the quality of optical remote sensing images, as well as the issues of geometric misalignment and radiometric inconsistency, making the generation of large-scale high-quality and seamless urban imagery a challenging task. In this study, a robust procedure for generating high-quality and seamless urban imagery is proposed. Specifically, a spatially adaptive method considering the uneven atmospheric scattering and a new algorithm (SLMM) are proposed to remove thin and thick clouds, respectively, and their effectiveness is verified by comparative experiments. A case application in Nanning City, China, showed that the proposed procedure and methods are effective and can handle the complex issues of geometric misalignment, haze and cloud cover, pansharpening, and radiometric inconsistency. In our future study, additional effective methods in the presented procedure will be investigated to improve the generated urban image. Quality bands for the generated image, which track the processing errors in each step and mark the pixel quality, will also be constructed.

Author Contributions:
The research topic was designed by X.L., Z.L. and H.S.; X.L., Z.L., R.F., S.L. and H.S. provided suggestions with regard to the method and experiments. X.L. and Z.L. performed the research and wrote the manuscript. Z.L., R.F., S.L., C.Z. and M.J. checked the experimental data, examined the experimental results, and participated in the revision of the manuscript. All authors have read and agreed to the published version of the manuscript.