3.3. Design of Multiple Channels for Shadow Detection
In this paper, the design of multiple channels considers both the properties of shadow and the interference of special remote-sensing features on shadow detection. The design of shadow channels should reflect the characteristics of the shadow region in the HSI color space. Shadow areas are caused by objects blocking the sun’s light; thus, they have low intensity [
44]. The saturation in the shadow region is high due to the influence of atmospheric Rayleigh scattering. The Phong illumination model shows that the shaded areas often have high hue values [
45].
Figure 2 shows the grayscale maps of multiple channels for the same image in shadow detection.
Figure 2a,b illustrates the original image and the true labeled shaded region, and
Figure 2c displays the grayscale map for the H channel. As observed in
Figure 2c, the areas with high hue (lawns, football fields, etc.) cannot be distinguished from real shadow areas through the hue channel. The intensity value of these high-hue areas is higher than the shadow areas, as shown in the I channel in
Figure 2f. On the basis of the characteristics of the shadow areas (i.e., high hue and low brightness), a new channel (namely, the H-I channel) is designed in this study. The rationality of the H-I channel depends on its ability to strengthen the difference between the H and I channels and distinguish the objects with high hue (lawns, football fields, etc.) and the shadow regions from the image. The value of the shadow areas through the H-I channel are higher than the high-hue objects in the nonshadow region, which appears white in
Figure 2d and can be easily separated. Dark objects (e.g., black roof or gray roads) with high hue and low intensity are found in the nonshadow areas. They are easily mistaken for shadows through the H-I channel. Therefore, a saturation channel is considered. These dark objects in nonshadow areas (i.e., the misjudged shadows) are usually less saturated and are brighter than real shadow areas. Therefore, the shadow regions detected through the H-I channel are further refined by using the low-saturation and high-intensity constraints. Specifically, the initial shadow-detection results are first obtained through the H-I channel. The initial detected shadow regions are processed in the saturation and intensity channels. The intersection regions of the detected regions in different channels are treated as the final shadow regions. Accurate shadow-detection results can be realized through the designed detection channels and framework. The ground truth in this paper is actually a shadow mask provided by manual annotation in the AISD dataset. Due to large number of shadows with various types and shapes in the selected aerial images, the shadow masks were carefully annotated by Luo et al. [
2]. In recent years, the AISD dataset has been widely used for evaluating shadow-detection performance [
2,
46]. As shown in
Figure 2g, the detected shadow is consistent with the ground truth.
3.4. Threshold Determination Using the DLA-PSO Algorithm
Another issue is how to select the optimal thresholds for each channel (i.e., H-I, S, and I channels) adaptively and accurately. In this study, a DLA-PSO is proposed on the basis of the traditional OTSU algorithm and PSO. PSO is a classical intelligent optimization algorithm that has been widely used in image registration [
47,
48,
49] and image segmentation [
50,
51,
52]. The advantages of the PSO algorithm include simplicity, fewer parameters, and fast convergence [
53]. The determination of thresholds in shadow detection can be calculated by using the PSO algorithm. In the PSO algorithm, an optimization function needs to be set to judge the fitness of the particle before and after the movement. On the basis of the fitness function, the individual optimal value (
) of the particle (
i) and the global optimal value (
) can be found. Shadow detection, itself, is a pixel-level image binary classification task, and the interclass-variance formula is a method to divide the image into foreground and background according to the gray characteristics of the image. In terms of shadow detection, the threshold of each channel needs to use the formula of interclass variance as the evaluation standard. The maximum interclass variance means the minimum misclassification probability. Given a particle swarm of the size (
n)
, the velocity of each particle is randomly set as
. The optimal threshold for shadow detection can be obtained by maximizing the interclass variance, which can be calculated as:
where
represents the interclass variance,
represents the proportion of shadow pixels in the overall image, and
represents the proportion of nonshadow pixels in the overall image.
represents the average gray value in the shadow image, and
represents the average gray value in the nonshadow image.
represents the average gray value of the whole image, which can be expressed as
. Thus, the simplified formula of interclass variance can be obtained as:
To solve the local-optimal problem for the classical PSO algorithm, a DLA-PSO algorithm is proposed in this study. For the
k-th iteration, the PSO algorithm changes the velocity and position of the particle in accordance with the velocity-update and position-update formula of the particle (
i), which can be expressed as:
where
and
are the learning factors, which are usually set to 2.
and
are random factors between 0 and 1, which are used to enhance the performance of the PSO algorithm [
53,
54,
55]. In Formula (9), the inertia weight (
w) determines the prior knowledge’s influence on the current velocity and affects the convergence speed of the algorithm. When
w is set too small, it means that the historical speed has too little influence on the current velocity, so that the particle moves in the local range, resulting in the local optimal solution. However, if
w is too large, it means that the historical speed has heavy influence, so that the particles move in a wide range in the later period, which will lead to nonconvergence. A large
w should be specified at the initial stage so that particles can find the optimal value domain as soon as possible. To avoid the situation of local optimization, a small
w should be given at the later stage so that a stable search speed can be maintained to accurately search the optimal threshold within the optimal value domain. This process is performed to balance the global search speed and local optimization effect. For this reason, a dynamic inertia weight (
) is introduced, which is calculated as:
The dynamic inertia weight changes with the number of iterations (
k), total number of iterations (
), maximum weight (
), and minimum weight (
). The
is set to 0.95, and the
is set to 0.4 [
56].
In this study, it can be found, from Formula (9), that the classical PSO algorithm adjusts the particle velocity by the individual optimal position (
) and the global optimal position (
). When the value of
is too large, the convergence rate of the particle is fast in the early stage, resulting in the local-optimal condition. However, when the value of
is too large, nonconvergence will occur in the late searching period under the constraint of a lack of global information. In order to alleviate the contradiction, neighborhood particles are considered, and the current iteration and the maximum iteration are combined to design the
β value. Therefore, global PSO and local PSO are combined to define the velocity-updating formula:
where Formula (13) represents the velocity updating on the basis of the global PSO, and Formula (14) represents the velocity updating on the basis of the optimization of particles in the local neighborhood.
The particle-velocity-update formula (
) is divided into four parts:
is the product of the particle velocity in the
k − 1 evolution and the dynamic inertia weight, which is regarded as the prior knowledge part of the particle (
i);
is the local perception part, which is the optimal distance between the current position of the particle (
i) and its own individual, and reflects the self-cognition of the particle (
i) itself;
is the global perception part, which is the distance between the global optimal position and the current position of the particle, and it reflects the global cognition of the particle (
i);
is the neighborhood perception part, which is the distance between the neighborhood particle and the current position of the particle, and it reflects the communication and information sharing between the particle (
i) and its companions. Thus, at the beginning of iteration,
is more suitable for searching the approximate range of optimal thresholds. At the same time, the existence of nearby particles can also limit the local-optimal problem caused by excessive velocity. In addition, more consideration is given to the influence of adjacent particles and current particles in the late search process, so that the optimal threshold can be accurately found within the rough range determined in the early stage to avoid the problem of nonconvergence. The framework of the DLA-PSO algorithm is illustrated in Algorithm 1.
Algorithm 1: Framework of the DLA-PSO algorithm |
Input: Designed channels (H-I, I, S); n, Max_iteration; Output: |
Initialization: c1 = 2, c2 = 2, ,; r1, r2 = rand (); = 2; while (k < Max_iteration), do; Calculate the fitness function according to Otsu algorithm; ; Set the optimal particle value and in the k-th iteration; ; for i = 1 to n, do; if (), then; ; if (), then; ; Update and ; ; ; ; return
|
3.5. Regional Optimization
The initial shadow-detection results can be obtained by using the proposed DLA-PSO algorithm. However, small low-brightness objects (e.g., the black vehicles on the road) in nonshadow areas or bright ground objects (e.g., white vehicles) in shadow regions are often misjudged. Two types of regional optimization are adopted in the proposed approach to further improve the shadow-detection precision. The connected regions in the shadow-segmentation results are calculated to select the low-brightness objects in the nonshadow area. To eliminate small low-brightness objects (e.g., the black vehicles on the road) in the nonshadow area, a spatial lower limit for the area of small objects should be given. In this paper, the spatial lower limit is set according to the spatial resolution of the experimental datasets and the size of vehicles in real life. The spatial resolution of the AISD dataset is 0.3 m, meaning that one pixel grid in the image represents an actual area of 0.09 square meters. In real life, the length of a vehicle is about 4–6 m, and the width is about 2 m. Thus, it can be calculated that a car occupies about 90–130 pixels in the AISD dataset. Therefore, the spatial lower limit is set as 130 for regional optimization. Meanwhile, bright ground objects in shadow (e.g., white vehicles) are often mistaken for sunlit areas. Due to the small size of the white vehicles in the remote-sensing image, it will appear as small holes in the detected shaded area, as shown in
Figure 3d. The closure operation of mathematical morphology is applied to the detected shadow areas to fill these holes.
Mathematical morphology is widely used in image processing, and especially in edge extraction and image segmentation [
57]. Two basic operations in mathematical morphology are the corrosion operation and the expansion operation. Small holes can be filled by using the closed operation, which is expansion followed by corrosion and is expressed as:
Formulas (17)–(19) represent the expansion, corrosion, and closure operations, respectively. The expansion enlarges the boundary of the target set (
A) and corrosion shrinks the boundary of the target set (
A). By definition, the closure operation of mathematical morphology can fill small holes and eliminate small, narrow cracks and voids while keeping the shapes and positions of objects unchanged. Therefore, the closure operation of mathematical morphology is adopted to eliminate the small holes in the detected shadow that are caused by bright small objects in the shadow area, as illustrated in
Figure 3e.