The GMM is a widely used data analysis method capable of classifying data points into different groups based on their distribution. In various research fields, image segmentation is a crucial step in extracting specific objects of interest from the overall view. For instance, in autonomous driving, lane line detection is a fundamental problem for guidance design. Previous research papers, such as [
18,
19], employed the GMM as the primary technique for analyzing images and extracting lane curves for car guidance laws. The robustness of the GMM’s segmentation results can be ensured with a well-designed image-processing pipeline. Another study [
20] provided detailed guidance on formulating the lane line extraction method based on the GMM and integrated the estimation results with Hough lines to achieve a high level of robustness in challenging driving scenes. Additionally, the GMM has shown potential in object tracking, as demonstrated in [
21,
22]. A 1D GMM fitting algorithm can be derived through Maximum Likelihood Estimation (MLE), and its pseudo-code is listed in Algorithm 1.
Algorithm 1 The 1D GMM fitting algorithm. |
Input: : initial guess of GMM parameters; K: total number of clusters |
: tolerance for iteration; : maximum iteration times; |
Output: |
1: while Iteration time do |
2: update |
3: update |
4: update |
5: update |
6: if the update amount < then |
7: break
|
8: end if |
9: end while |
2.1. Image Segmentation in CIE-Lab Color Space
CIE-Lab represents colors by combining lightness (L*) with two color component axes, a* and b*, which correspond to the red-green and blue-yellow color dimensions, respectively. To segment color images, the two channels describing color information are used to perform GMM fitting.
Algorithm 1 summarizes the input for the GMM, which includes the data, initial guesses of the Gaussian parameters , and desired total number of clusters, K. These parameters play a crucial role in the convergence speed and accuracy of data fitting, similar to other fitting algorithms. Therefore, GMM color image segmentation can be optimized through three aspects: (1) reducing data size, (2) determining the best input parameters for the GMM, and (3) self-adjusting the GMM cluster number K by considering the mission properties. To demonstrate the effectiveness of these methods, the finalized pipeline and example results using both simple and complex landing scenes are presented.
First, to simplify the evaluation of the GMM’s time cost, a 1D data sequence is used to simulate a landing procedure, originally sized at 720 × 1280 pixels. It is then resized to 360 × 640 pixels, and GMM segmentation is performed again. The results in
Figure 4 show a significant reduction in the average operation time of the GMM from 5.40 s to 0.78 s after resizing the image to half its original size. This demonstrates the necessity of adjusting the image resolution to a smaller size to meet the efficiency requirements of the system. Additionally, during the landing procedure, the runway is expected to appear in the lower part of the image, forming a trapezoid-like shape. To eliminate unnecessary regions, a quarter-triangle mask is designed, as illustrated in
Figure 5, which rotates around the center of the image (green point) based on the current estimated roll angle (
). This mask effectively reduces the data size by about 25%. Examples of cropped images are shown in
Figure 6. In summary, by compressing the image to half its size and cropping it using the designed mask, the data size is reduced to almost one-sixteenth of the original.
Next, the choices of the initial guesses are analyzed. In the following simple experiment, manually generated data sized 307,200 are used. With different sets of
, the results vary, as shown in
Table 1. Note that the weights of each Gaussian function are fixed to be
. The findings indicate that when the initial guesses are closer to the ground-truth value, the convergence is faster. On the other hand, if the initial guesses are too poor and far from the ground-truth value, the GMM might fail to converge to the global optimum. As mentioned earlier, it is crucial to assign an appropriate set of initial guesses when using the GMM. Additionally, the choice of the number of clusters (
K) affects the computational time; larger
K values result in higher computational costs. Thus, selecting an adequate number of clusters to divide the data properly and avoid excessive time consumption is essential.
To address this issue, an algorithm that automatically determines the initial guesses has been developed, which is only required in the first frame of the entire landing process, ensuring higher efficiency. Since the view of the camera does not change drastically within a short period, the variation between image histograms can be considered smooth. Consequently, the locations of peaks on the histogram move slowly, allowing one to reuse the estimated GMM results of the current frame as the initial guesses for the next frame.
As shown in
Figure 7, the locations of the peaks in the pixel histogram play a significant role in selecting the initial guess,
. The convergence points closely align with these peaks, as indicated by the red and black lines. The number of peaks corresponds to the number of clusters required to adequately divide the data, represented by
K. Algorithm 2 outlines the procedure for initializing the parameters for the GMM.
Figure 8 illustrates the histograms of the a* and b* channels, along with the estimated peaks using this algorithm. To implement the process of finding peaks, the function provided by MATLAB [
23] is directly used. However, there are still some user-defined parameters for this function, which significantly impact the results. These parameters are set as listed in Algorithm 2. To balance efficiency and accuracy, the maximum number of clusters is set to six, and the minimum is set to three for the first frame in the entire landing procedure. If the total number of clusters falls outside this range, the threshold will be adjusted accordingly. Additionally, after clustering the pixels in the a* and b* channels, pixels belonging to the same group in both channels are considered as a single cluster. For example, if the GMM result indicates two clusters in the a* channel and three clusters in the b* channel, this implies a total of six image segments with indices (1, 1), (1, 2), (1, 3), (2, 1), (2, 2), and (2, 3).
Lastly, as pilots perform the landing, the complexity of the view reduces, as evidenced in
Figure 9 and
Figure 10. Note that the images in these figures were processed using the resizing and cropping methods introduced earlier in this section. Considering the histograms, it is shown that the number of required clusters decreases during landing. Throughout the entire landing process, if the value of
K is fixed, which was determined from the first frame, some Gaussian function weights may decrease to zero, leading to a divergence in the GMM. To address this issue, a mechanism is designed to check the estimated weights (
) after each GMM calculation for a frame. If the system finds any weight less than 0.1, the corresponding Gaussian function with
is removed, and
K is adjusted to
accordingly. This ensures the stability and accuracy of GMM-based image segmentation during the landing process.
Algorithm 2 Initialization of GMM parameters for color images in the CIE-Lab color space. |
Input: data in channels a* and b* |
Output: : GMM initial guesses
|
1: Initialize parameters for finding peaks: |
2: MIN_PEAK_DIST_A = |
3: MIN_PEAK_DIST_B = |
4: MIN_PEAK_HEIGHT = 0.02 |
5: TS = 0.0001 |
6: Find peaks using the 4 parameters: |
7: the number of peaks of |
8: while K is less than 3 or more than 6 do |
9: if K < 3 then |
10: TS = TS/2 |
11: else if K > 6 then |
12: TS = TS*2 |
13: end if |
14: Find peaks using the 4 parameters |
15: end while |
16: peaks of |
17: locations of the peaks |
18: = 5 |
19: |
To validate the developed method and evaluate the image segmentation performance, two landing simulation image sequences are used: one with a simple scene and another with a complex scene, as shown in
Figure 11. The corresponding segmentation outcomes are illustrated in
Figure 12 and
Figure 13, while the computational time results are recorded in
Table 2 and
Table 3. Note that the maximum and minimum data are invalid for the phase of finding peaks since it is only performed once.
These results demonstrate that the method successfully extracted the runway almost exclusively, and the reduction mechanism of
K worked as expected. Note that in
Figure 13b, the segment on the right-hand side appears fragmented. However, it actually contains the complete runway, which is obscured by a very dark shadow. Also, the segment in the complex scene contains some non-ground objects, as shown in
Figure 13a, because these objects have a similar color to the runway, making it impossible to separate them through color-based clustering techniques.
Additionally, through the strategic removal of unnecessary regions and resizing the resolution, the computational demands were significantly reduced while maintaining an acceptable level of segmentation accuracy. This ensures that our proposed method can efficiently handle real-time image segmentation tasks during the landing process.
To conclude this section, an adaptive pipeline is developed to automatically determine the GMM initial guess,
, and
K for the first frame in the landing procedure. Then, the GMM results, including
, can be fed to the GMM algorithm as the initial guesses for the next frame, as shown in
Figure 14. Also, efficiency has been increased by using image pre-processing techniques, and GMM convergence for each frame has been ensured by automatically culling the redundant Gaussian functions.
2.2. Integrating the Proposed Method with ORB-SLAM2
This section focuses on extracting the ground information from all the clusters generated through the GMM, which is crucial for landing attitude guidance. Additionally, the cascading effects between the input images and ORB-SLAM2 are discussed in detail.
To begin, the process of extracting the landing area is first explained. The sample inputs (
Figure 15) include two test images, and the segmentation results are presented in
Figure 16a,b. These results highlight the cluster containing the runway with a red rectangle. It can be observed that paved areas, like runways, have a darker appearance compared to their surroundings. Hence, the cluster with the lowest average gray value is considered a potential landing area candidate.
Sometimes, insignificant fragments may also have a very low average gray value, which leads to an unsuccessful selection. As shown in
Figure 16b, the average gray value of Segment 1 is 49.19, slightly lower than that of Segment 4, which is the correct desired segment. To address this, an iterative process is introduced to ensure a sufficient number of pixels in the selected cluster, avoiding selecting the wrong fragment with a low average gray value. If the criteria for the number of pixels are not met, the algorithm will select the cluster with the next lowest average gray value until it finds a suitable candidate. Finally, this extracted landing area serves as input for ORB-SLAM2.
However, feeding the image segment into ORB-SLAM2 may decrease the feature robustness and the accuracy of SLAM. First, since the feature points are restricted to a smaller region, ORB-SLAM2 lacks a sufficient number of feature points to estimate the current pose in some frames, leading to tracking loss (
Figure 17a). Second, the pixels on the hard edges may be detected as FAST keypoints, as shown in
Figure 17b. These keypoints have a very low reference value since they are not actual edge points in physics, which leads to a decline in feature-matching accuracy.
To solve these issues, some image-processing steps are integrated after GMM segmentation.
Figure 18a shows the segmented runway generated by the GMM in the form of a binary mask. The lines on the runway are culled, leaving pixel holes in this segment. As the first step, image dilation is applied using a structural element to enlarge the informative region (
Figure 18b), which allows FAST to select more feature points beside the runway. Next, a Gaussian kernel is used to smooth the edges of the image segment, preventing incorrect keypoint selection on hard edges (
Figure 18c), thus reducing the possibility of feature mismatches. After these additional steps, the binary mask is used to perform the pixel dot product, resulting in the filtered image (
Figure 19b).
As presented in the figure, the processed image is horizontally enlarged, including the region with strong texture, implying that FAST could select more keypoints, thus increasing the accuracy of feature matching. Also, the edges of the segment are smoothed, addressing the issue of low-reference-value feature points. The adjusted results of
Figure 17 are shown in
Figure 20, indicating that the two problems have been solved through these additional image-processing steps.