1. Introduction
Thin-cloud areas in remote sensing images exhibit information regarding both thin clouds and the underlying land surface; the radiometric distortion of the land cover will result in image classification and target detection errors, which restricts land cover-based applications. The thin-cloud mask, which aims to delimit pixels contaminated by thin clouds, is a critical preprocessing step to ensure the accurate utilization of data. Jointly influenced by the thin-cloud thickness, diversity and complexity of obscured land covers, it is less likely to describe thin clouds via a uniform spectral characteristic. In addition, thin clouds are often confused with high-reflective land covers. As a result, it is difficult to utilize the traditional thick-cloud detection methods for thin-cloud masks. To overcome the aforementioned problems, several thin-cloud detection/mask approaches have been successfully developed according to the characteristics of thin clouds. Similar to those in Sun et al. [
1], we can classify these approaches into three main categories: the transform method, decomposition method, and dark object method.
The transform method converts remote sensing images from the original multiband space into a new feature space via certain operations, to highlight thin clouds and suppress the background noise (i.e., the clear area). Because the reflectance in the red band is highly correlated with that in the blue band under clear sky conditions, digital number (DN) values are scattered along a line, which is known as the clear-sky line. Affected by haze and thin clouds, DN values in the red and blue bands increase simultaneously. However, the amplitude increase for DN values in the blue band is higher than that in the red band, which causes the pixels that are contaminated by thin clouds to retreat from the clear-sky line. By manually selecting the area free of thin clouds, the parameters of the clear-sky line are estimated, and the haze-optimized transform (HOT) is implemented by rotating the coordinate system using the obtained parameters; then, thin-cloud detection and removal are conducted in the transformed domain [
2]. However, because different land covers, such as buildings, bare soil, snow, and ice, have different clear-sky lines, using an identical skyline for an image with diverse land cover may lead to under- or over-correction problems. The land cover type-based method can be used to effectively overcome the drawbacks mentioned above [
3]. Moreover, the iterative HOT method introduces multitemporal images to iteratively study and eliminate confusion regarding thin clouds and high-reflective land covers [
4]. The correlation between the red and blue bands could also be utilized to realize the semi-automatic selection of clear/thin-cloud areas and automate the transform procedure [
5]. The semi-inversed image, which is obtained by replacing the RGB values from each pixel with the maximum in the initial channel value (r, g, or b) and its inverse (1‒r, 1‒g or 1‒b, respectively), is employed for haze detection and removal, based on the fact that the hue difference between the original image and the semi-inversed image is larger in the hazy area than in the clear area [
6]. Other transform methods include the Kauth–Thomas (KT) transformation, where the fourth component of the transformed result mainly corresponds to noises, including thin clouds, which can be used for thin-cloud evaluation [
2,
7,
8]. While the KT transformation uses a constant transformation coefficient for an identical sensor, the background suppression haze thickness index (BSHTI) transform, which suppresses the background and highlights the thin-cloud area, obtains the transformation coefficient via statistical information from the manual selection of clear and thin-cloud areas and detects haze in the transformed images [
9].
The decomposition method attempts to decompose the information in a thin-cloud area into two independent components: thin clouds and land cover; then, thin clouds are detected and removed for the thin-cloud component only. As a result of the blurring effects of thin clouds, the thin-cloud area in remote sensing image corresponds to low-frequency domain. In contrast, clear areas with distinct boundaries correspond to the high-frequency domain. By applying high-pass filtering to thin-cloud images, the low-frequencies can be suppressed while the high-frequency component can be amplified. This can eliminate or alleviate the blurring effect of thin clouds and preserve the radiometric information of clear areas [
10]. The classic method processes all of the low-frequency components equally, which results in new distortions in the radiance of clear areas in an image. Ideally, filtering should be carried out purposefully for thin-cloud areas by introducing a frequency cut-off procedure [
11]. Thin clouds in remote sensing images are manifested as a result of a combination of thin clouds and underlying surfaces; mixed component decomposition attempts to estimate the proportion of the two basic components mentioned above [
12]. To a large extent, the accuracy of this method depends on the component selection, spectral characteristic determination, and selected decomposition approach, which causes a poor performance when applied to images with heterogeneous land surfaces.
In images that are routinely photographed, some pixels always possess extremely low values in a given channel, such as surfaces with shadows and brightly colored objects. The bands formed by these low values are referred to as the dark channel. The proportions occupied by shadows or colored objects in images of natural scenes are very high, which results in a dark channel. If the gray level in a dark channel related to an image is large, the existence of fog or haze is indicated. On this basis, the thickness of fog can be estimated to further perform image sharpening and defogging [
13]. At first, the dark object method was used for path radiance estimation in remote sensing images [
14]. This was then developed into a thin-cloud removal method called dark object subtraction (DOS), as presented in [
2,
3]. Meanwhile, features in dark pixels were also utilized for thin-cloud detection. For example, in the literature [
15], images were classified into mutually disjoint sub-blocks of a definite size to search for dark pixels within them. Then, a haze thickness map (HTM), with a resolution identical to that of the pixel, was obtained by cubic interpolation of the dark pixel. Finally, the thin-cloud area could be identified via threshold segmentation of the HTM.
Instead of acquiring accurate thin-cloud masks directly, the methods described above mostly obtain haze/thin-cloud maps that indicate the thickness of the haze or thin clouds, which can be used for thin-cloud removal in later stages with the proper methods, such as the DOS method [
2,
3], air–light elimination [
16], and the fusion-based method [
17,
18]. However, insufficient or excessive thin-cloud removal can easily result in new radiometric distortions in the thin-cloud removed image. In addition, the increased availability of remote sensing images with similar spatial resolutions, band settings, and short time intervals enables the distorted data incurred by thin clouds to be directly abandoned or dealt with later by users. Therefore, it is of great importance to develop accurate thin-cloud mask methods.
To address the problems mentioned above, a thin-cloud mask approach based on the sparse dark pixel region detection method is proposed in this study. The main contributions of our method are twofold. (1) We introduce a distinctive feature (the density of dark pixels) and a nonparametric measure (the area of the Thiessen polygon) for the thin-cloud mask. Extensive studies have confirmed that dark pixels are widely existent and scattered throughout the clear parts of images, whereas the density of dark pixels, which are sparse in thin-cloud areas, enables them to become a robust feature. This distinctive feature, combined with the nonparametric measure, can automate the thin-cloud mask process. (2) To obtain a pixel-based cloud mask, the BSHTI is used to transform the original image into the transformed space. The image is segmented locally for every thin-cloud candidate, which can suppress the salt-and-pepper noise and separate bright land covers.
The remaining parts of this paper are organized as follows. The principle of this approach will be introduced in
Section 2. Then, we will present the detailed implementation of our method in
Section 3.
Section 4 will give the experimental settings and results. The applicability of our method is discussed in a wide range of settings, such as the different percentages of clouds and the underlying land cover, in
Section 5. Finally, the entire study will be analyzed and discussed in
Section 6.
2. Principle
The visual and near-infrared bands, which are commonly used in passive remote sensing, are highly sensitive to clouds. The effects of thin clouds on solar radiation include reflection, transmission, and absorption. Reflection includes two main effects—scattering and specular reflection. The intensity of these effects depends on the thickness of the clouds. In cases where clouds are rather thick, solar radiation is almost entirely reflected; the cloudy area in the remote sensing images is characterized by high reflectance. When the clouds are thin, the main effect is scattering reflection, which causes solar radiation to scatter in an approximately uniform manner in all directions in the visual and near-infrared bands. Consequently, the thin-cloud area pixels in remote sensing images are added with approximately equal scattered radiation values, which are recorded by a remote sensor. Visually, this process is equivalent to mixing a certain proportion of white components into the image. At the same time, only part of the solar radiation penetrates through the clouds (i.e., transmission) to reach the land surface, which results in the reduction of surface reflection received by the sensor. Visually, there is a corresponding contrast in thin-cloud area drops.
Consistent with the above analysis, under the circumstance of thin clouds, radiance received by the sensor can be simplified and expressed as follows:
where
R refers to the value of radiance (e.g., the DN value) measured by the sensor,
A represents the value of the solar radiation received by the Earth’s surface,
α is the surface reflectance ratio,
B and C represents the scattered radiance and the reflected radiance caused by thin-cloud, respectively. The reflection and scattering phenomena of clouds may lead to changes in the transmission direction of solar radiation; this portion of radiance is recorded by the sensor before it mutually interacts with the land cover. This part of the scattering radiation is generally referred to as the path radiance. Since the path radiance is received by the corresponding sensor before it arrives at the Earth’s surface, it contains no surface information and serves as a significant source for information loss and passive remote sensing noise.
Common dark pixels include mountain shadows, soil (black soil), pure water bodies, dense vegetation, and brightly colored objects. In other words, dark pixels exist widely in richly textured clear images [
14,
19] and are scattered throughout the whole image. The effects of thin clouds on atmospheric radiance are mainly characterized by scattering. Therefore, all pixels in the thin-cloud area contain approximately equivalent scattered radiations in the visual and near-infrared channels. Since the radiometric quantities of dark pixels also increase, no dark pixels with low gray levels exist in thin-cloud areas. For this reason, knowing whether or not dark pixels exist in local areas can be used to evaluate the existence of thin clouds. In this paper, a thin-cloud mask method based on the sparse dark pixel region detection method is proposed. Novelties for this proposed method are mainly reflected by (1) the proposal of a multiresolution dark pixel detection method, which is suitable for thin-cloud masks; and (2) the introduction of the Thiessen polygon area as a dark pixel density measure, where the candidate for thin-cloud areas can be identified, and the thin clouds are masked locally for each thin-cloud candidate. In this manner, the accuracy of the cloud mask can be improved.
3. Methods
The thin-cloud mask method presented in this paper included three major steps: dark pixel extraction, sparse dark pixel region detection, and the BSHTI transform and local thin-cloud segmentation (as shown in
Figure 1). Below, the implementation details are illustrated.
3.1. Dark Pixel Extraction
For a thin-cloud region, the electromagnetic radiation intensity (e.g., the DN value, reflectance, and the gray level, which we denoted hereafter as the gray level for simplicity) received by the sensor was contaminated by the path radiance. The path radiance increased the observation value for all of the pixels in the thin-cloud area of the remote sensing image, indicating that no dark pixels with extremely low gray levels existed in the visible and near-infrared bands of that area. However, dark pixels in the clear areas of remote sensing images were widely existent and scattered throughout the image. Based on this distinct feature, the thin-cloud area detection problem could be resolved by detecting areas with sparse dark pixels. Consequently, the definition and extraction of dark pixels were crucial to the successful implementation of this algorithm.
In traditional methods, a pixel whose gray value was smaller than the user defined threshold, such as the smallest percent (P) of pixels, was recognized as a dark pixel. However, large areas of shadows, pure water bodies, and dark colored vegetation were often clumped together and formed dark pixel patches, which had a negative effect when measuring the density of dark pixels. To overcome this problem, a dark pixel was defined in a local window (i.e., an area of w × w pixels) to assure the relatively uniform distribution. Accordingly, the dark pixel extraction method was designed as follows:
(1) A cloudy image with
n band(s), denoted as
, needed to carry out the thin-cloud mask, where
represents the
ith band of the image. The minimum values of all of the bands were extracted and a new band was formed, which is represented by
. For a pixel in the
ith row and the
jth column,
could be expressed by the following equation:
where min represents the minimization operation.
(2) For the band used for , a pixel whose gray level was smaller than the user defined threshold, , was selected as the dark pixel candidate.
is defined using the following formula:
where
represents the cumulative histogram of band
, and
represents the smallest gray level, whose cumulative percentage in
is larger than P. The problem for the selection of
is transformed to the determination of P, where P indicates a pixel has the potential to become a dark pixel.
As mentioned above, the image content varied; some images contained large portions of dark pixels clumped together; therefore, it was unreasonable to use an identical P for all of the images. Considering this problem, a multiresolution method was utilized in this paper to define the value of P. The image was divided into a group of non-overlapped patches, which were expressed as , where represents a patch with a size of i. To eliminate the negative effects of large areas of dark pixels clumped together, only one pixel in the center of the patches was used to calculate the cumulative histogram.
For the resolution of
, the threshold
of the dark pixels was determined via Equation (3). Lastly, the final threshold of
was obtained by selecting the maximum threshold of all the resolutions:
On this basis for threshold determination, the pixels of were selected as dark pixels candidates.
(3) For a dark pixel candidate along the
ith row and
jth column, we defined a window surrounding the test pixel with a size of w (i.e., a w × w size). When this pixel was the smallest in the window, it was identified as a dark pixel and flagged with 1; otherwise, it was flagged with 0. The dark pixel bands that were acquired are denoted as
:
The dark pixels obtained by this method were defined as feature points, assuming that there were
m feature points extracted, which is expressed as follows:
Therefore, only one dark pixel was selected in the local widow to ensure that all these feature points were uniformly distributed with similar densities in the clear areas. However, in the thin-cloud areas, there were either none or only a few dark pixels, which resulted in a low density.
Figure 2 shows the dark pixels extracted by the method that was proposed in this study (w = 7, P = 30%). It could be observed that the feature points of the dark pixels had a large quantity and were more uniformly distributed in urban areas and in areas of dense vegetation. By contrast, none or few dark pixels had been found in thin-cloud areas. In other words, dark pixels in these areas were sparse. Depending on the above analysis, thin-cloud detection became a problem when extracting regions where dark pixels were sparse.
It should be noted that w and P were two essential parameters that decided whether or not a pixel could become a dark pixel, which determined the quantity and distribution of the dark pixels. The parameter set method and its influence on the accuracy of thin-cloud masks will be discussed in detail in
Section 5.1.
3.2. Sparse Dark Pixel Region Extraction
Traditionally, point density was mostly measured parametrically based on counting the number of points in a local window. The accuracy obtained by these methods relied on the window shape design and parameters (such as window size), which should be selected carefully. When the feature points were distributed uniformly in a regular shape, all of the approaches could yield favorable results. However, sparse feature point areas formed by thin clouds had irregular shapes, different sizes, and notable differences in their densities. Therefore, it was a challenge to adopt a window-based feature point density measurement, as failure in the effective estimation of sparse feature point densities might have led to the missing detection of thin clouds or a low accuracy of the thin-cloud boundary. To address the problem presented above, a nonparametric point density measurement method was utilized in this study.
The Thiessen polygon was a commonly used zoning method. In the Thiessen polygon result, the distance from any position in the polygon to its corresponding point was shorter than that to any of the other sides of the polygon. Therefore, only one feature point existed within the range of each Thiessen polygon. In other words, a one-to-one corresponding relation existed between the feature points and their Thiessen polygons.
Figure 3 shows the Thiessen polygon corresponding to the feature points in
Figure 2. It was clear that the area of the polygon was large when the feature points were sparse. According to this fact, a sparse dark pixel region extraction method was designed as follows:
(1) Thiessen polygons, which divided image regions into disjoint subsets, were established according to the feature point set
. Then, their areas were calculated and expressed as follows:
where
represents the area of the Thiessen polygon corresponding to the feature point,
, indicates the density of the feature points around point
.
Voronoi diagrams and Thiessen polygons were dual diagrams that interacted with each other, which meant that the Thiessen polygon could be established through the Voronoi diagram. We constructed the Voronoi diagram using a maximum least-angle rule, which could be implemented by multiple classic methods and an improved method.
(2) The Thiessen polygon was divided into two mutually disjoint subsets, which were referred to as the sparse set () and the dense set (). The former represented the sparse areas of feature points, which indicated thin-cloud candidates, and the latter represented their dense areas, which were also known as clear areas. The initial values of and were set as and , respectively. The superscript 0 denoted the number of iterations.
(3) The average
and the standard deviation
of the polygon for a dense set were calculated using the following formulas:
where
represents the number of elements in set
.
(4) The large area of the Thiessen polygon indicated the sparse feature points near the location of the feature point. Considering this, polygons satisfying Equation (10) were treated as abnormal values of
and were eliminated from the set
.
where
represents a multiple of the standard deviation, which we set to 3 in this paper. As we selected only one dark pixel in the local window w, whose size was a w × w pixel. Ideally, the area of the Thiessen polygon in the clear part of the image obeyed a normal distribution, whose mean
approached the number of w × w pixels. By repeating the experiment with different parameters,
= 3 was deemed as appropriate to separate the dense, dark pixel region from the sparse region.
The eliminated polygons were added into set . In this case, two new sets were acquired and denoted as and .
(5) and were utilized as initial values to repeat steps (3) and (4); the iteration process was repeated times until no more polygons met the condition of Equation (10) in set .
(6) The region enclosed by a polygon in the sparse set was recognized as a thin-cloud area candidate. The connected pixels formed a thin-cloud patch and were treated as thin-cloud candidates, which could be expressed as follows:
As certain dark pixels existed near the thin clouds, the candidate thin-cloud area was larger than the real thin-cloud area in most cases, which incurred false detection. Meanwhile, the corresponding detection might have missed some thin clouds with shadows, as an improper area threshold was adopted. Therefore, the boundary of the thin cloud should have been further refined by additional methods.
3.3. BSHTI Transform and Local Segmentation
As a result of the diversity of the land cover and the various thicknesses of thin clouds, it was less likely to represent thin clouds and backgrounds via a unique feature with wide applicability. In addition, single-band segmentation had drawbacks regarding fully exploiting the information contained in the multiband remote sensing image. According to [
8], the BSHTI transform, which synthesized the multiband image into a single band, had the capacity to suppress the background and highlight thin clouds. The new band,
, obtained by the BSHTI transform, had a linear weighted sum that was calculated by the multiband images. The expression was given as follows:
where
represents a series of conversion coefficients for multiple bands.
The objective of this transform was to enable the mean value of the thin-cloud area (
) in the BSHTI band to be as high as possible, whereas the values in the clear area should have been distributed in a concentrated manner to the greatest extent This meant that the standard variation of the clear area for
should have been as small as possible. This problem could be solved by searching for an optimal solution to
with a constraint condition of
:
The problem could be optimized by the following formula:
where
represents the series of conversion coefficients for all of the bands, which was the parameter to be determined.
is the correlation coefficient for all of the bands denoted by an
matrix. Both
and
are mean values for all of the bands in the clear and thin-cloud areas, respectively. On the basis of obtaining the value
K,
could be set to 0 for the convenience of solving
.
As a supervised method, thin-cloud and clear areas should have been manually interpreted to calculate the statistics. In this study, candidates of the thin-cloud areas and clear area obtained in
Section 3.2 were adopted to train the
parameters directly. The conversion results of the image in
Figure 2 are presented in
Figure 4, which improved the capacity to highlight thin-cloud areas and concentrate various background areas.
The false and missing detection of thin-cloud areas might have appeared in regions adjacent to those for candidates of thin-cloud areas. Therefore, thin-cloud areas were iteratively buffered with 1 pixel until the following condition was satisfied:
where
refers to the number of pixels additionally found due to buffering, and
represents the number of a candidate cloudy area in the
pixels. Equation (16) indicated that the proportion of thin clouds to background pixels approached a 1:1 ratio. Then, the Otsu algorithm was employed to define a segmentation threshold for the area of
in order to locally separate the thin-cloud area from the clear area. Finally, the thin-cloud mask results were obtained by repeating the morphological operations with a set number of times (in this paper, we set it to four) to remove the rough boundaries and interpolate the missing data.
5. Discussions
5.1. Parameter Setting
The number and distribution of the dark pixels imposed a direct influence on the separation ability of sparse dark pixel areas from the whole image, using threshold segmentation. By determining whether or not a pixel could become a dark pixel, the window size w and the percent threshold P were two key parameters, which should be set carefully by the user. Of these, w represents the size of the local window where a dark pixel exists; P defines the maximum acceptable gray level required by a pixel to become a dark pixel. Excessively low w and large P values could have easily resulted in too many dark pixels; if they were distributed in a thin-cloud area, the corresponding thin area might have been identified as a clear area. On the other hand, when w was too large and P was too small, the number of dark pixels might decrease significantly, resulting in dark pixels that would not be sufficient for separating high-reflectance land surfaces. This caused high reflective land surfaces to be identified as thin clouds.
To evaluate the influence of the parameter setting on the accuracy of the thin-cloud mask, the overall accuracy (OA) of the thin-cloud detection was calculated with a different combination of w and P, where w changed between 7 and 19 with step of 2, and P varied from 5% to 40% with step of 5.
Figure 8 shows the OA error bar with changing w and P values; it could be observed that parameter variations within the above range had a minor influence on the OA of the algorithm, as it was simply a tradeoff between the missing detection and the false detection. The results demonstrated that the algorithm was robust over a wide range of parameter settings; therefore, we suggest that the values of w and P could be set to 7 and 30% to obtain an average accuracy with an acceptable true ratio and detection ratio.
5.2. Separation of Thin Clouds from Confusing Land Covers
It was well recognized that thin clouds were often confused with high reflective landscapes, such as urban areas or bright bare soils. As a mixed landscape, urban areas were composed of impervious surfaces (e.g., buildings and roads), vegetation, soil, and waterbodies. Of these, dense vegetation, waterbodies, and shadows of tall buildings and trees could become dark pixels, which were scattered in the urban area around the high reflective pixels in a mosaic pattern. Similarly, a certain number of dark pixels existed in the high reflective bare soil area, such as shadows and low-reflection pixels. The proposed thin-cloud detection method captured the unique features when the portion of thin-clouds in the remote sensing image were mixed with a white component in the visual and near-infrared bands; the result was that no pixels with extremely low values existed in these areas. Since this method detected thin-cloud areas with dark pixels instead of using high reflective values, the problem regarding confusion between high reflective landscapes and thin clouds was resolved. Additionally, the experimental results confirmed that the proposed method possessed the capacity to separate confusing land surfaces from thin clouds. In addition, the method identified thin clouds via local segmentation of the thin-cloud candidate, which suppressed the salt-and-pepper noise that was induced by high reflective landscapes.
As a common approach for thin-cloud detection, this method was robust for both clear images and images composed of clouds. For a clear texture-rich image, where dark pixels were scattered throughout the whole image, the areas of the Thiessen polygons obeyed the normal distribution and were concentrated in the range of the mean, plus or minus three times the standard deviation. Through threshold segmentation of the Thiessen polygon area, few or no sparse dark pixel areas were identified, which caused few or no false detections in the clear texture-rich images. For an image composed of clouds, the dark pixels were sparsely distributed, which exhibited similar features with those in the clear image. However, the minimum gray value or the absolute dark pixel density was larger than that of the clear image, so an additional threshold was required to separate images full of clouds from clear images.
Similar to thin clouds, thick clouds provided missing or distorted landscape information in the remote sensing image. In this respect, there was no need to separate thin clouds from thick clouds. However, semitransparent thin clouds in remote sensing images contained a certain proportion of information about the landscape; then, the thin clouds could be unveiled from the landscape information to obtain a synthetically clear image via an appropriate method (e.g., DOS [
2,
3], air–light elimination [
16], and the fusion based method [
17,
18]). Since the main objective of our method focused on the identification of the distorted area, there was no clear separation between thin clouds and thick clouds. This method was robust for large, thick clouds. However, this method might have omitted small, thick clouds with corresponding shadows that were distributed near thick clouds.
However, when an image was covered with a large area of waterbody, this method might have failed to mask thin clouds accurately. The spectral signature of clear water was characterized with low reflectance (especially in the near-infrared band) when thin clouds obscure this type of water, and an inappropriate global gray level threshold P. This might have led to the identification of several dark pixels in the thin-cloud area, which resulted in the missing detection problem at the water surface. However, because of the effects of specular reflection and high sediment concentration, several waterbodies were characterized with high- or middle-reflective values, which resulted in none or fewer dark pixels on the smooth and bright water surface. Consequently, false detection could incur.