1. Introduction
Digital maps are discrete data sets that record and store geographical features in digital form within a defined coordinate system, including specific locations, attributes, relational indicators, and names [
1,
2]. Traditional paper maps, which transform data into graphical representations, are time-consuming, labor-intensive, and often result in lower data precision [
3]. In recent years, geological disasters have occurred frequently, with an increasing trend in the occurrence of disasters such as landslides, mudslides, floods, and earthquakes [
4,
5]. The acquisition of the latest geographical information of disaster-stricken areas has become a critical aspect and an important guarantee for rapid emergency rescue after disasters, directly affecting the efficiency of operations during the golden rescue period and the level of protection for people’s lives and property [
1,
4,
6]. Therefore, the method of two-dimensional digital map modeling method based on UAV aerial images plays a significant role in enhancing the response speed of emergency rescue operations and mitigating the impact of disasters [
7].
Traditional survey methods, such as proximity inquiries and visual estimations, are increasingly inadequate due to the complexity of disaster situations and terrain. Methods like individual soldier image transmission and satellite imagery are also limited by signal interference, terrain challenges, and complex building structures, affecting their effectiveness. Advances in drone technology have significantly enhanced emergency response by capturing multi-angle photographs of disaster sites and transmitting real-time images, improving the timeliness of rescue operations [
8].
The principle of UAV mapping involves mounting a camera on the drone to capture images of the Earth’s surface, utilizing the drone’s flexibility. Subsequently, the acquired images are processed to produce high-precision digital maps [
9]. Due to their low cost, flexibility, and high resolution, UAVs have been increasingly applied in various important fields such as environmental monitoring, remote sensing, and target tracking [
10,
11]. Therefore, this study selects UAV imagery as the source of the images.
Image registration, a critical step in digital map production, aims to correct misalignments caused by variations in lighting, scale, displacement, and rotation across different modalities [
12]. Ramli et al. proposed the CURVE feature extraction technique, which combines retinal vascular and noise features to enhance fundus image registration accuracy, achieving 86% accuracy, significantly outperforming existing methods [
13]. Liang et al. developed a binary fast image registration method based on fused information, improving FAST feature detection and weighted angular diffusion radial sampling to achieve rapid and accurate UAV image registration [
14]. Gu et al. introduced the AC-SURF algorithm with an angle compensation strategy for damage detection in rotating blades, enhancing the efficiency and accuracy of wind turbine blade operation monitoring and damage detection through digital image correlation techniques [
15].
To obtain images with a wider field of view, image stitching is a necessary preprocessing step in UAV remote sensing applications [
16]. Therefore, after image registration, it is necessary to select an appropriate method to stitch multiple images together. This involves projecting the overlapping images onto the same reference plane through spatial transformation, aligning the overlapping regions in a single step to create a panoramic image [
17]. Due to factors such as the capture time, location, and lens distortion of the UAV, the overlap between adjacent UAV images is irregular. Consequently, traditional stitching methods often exhibit significant geometric and tonal discrepancies at the stitching boundaries, resulting in noticeable gaps in the stitched map [
18]. Currently, there is extensive research focused on improving the quality of image stitching. Jia et al. proposed a multi-feature extraction algorithm based on grayscale, complementary wavelet (CW) chrominance, sharpness, and natural scene statistics (NSS) for image stitching tampering detection, significantly improving detection accuracy [
19]. Zhang et al. presented an improved scale-invariant feature transform (SIFT) method for underwater image stitching, enhancing feature extraction and precise matching, which notably increased stitching quality and robustness, effectively reducing ghosting and distortion [
20]. Li et al. proposed an improved RANSAC-SURF algorithm for vehicle seat type detection and spring hook missing recognition, enhancing detection accuracy and robustness [
21]. Liu et al. introduced a merge-sorting-based method for multi-scene image stitching, significantly reducing computational time and improving the efficiency of image registration and stitching, minimizing distortion in the stitched images [
22]. Chen et al. proposed a UAV image stitching method based on diffusion models, effectively eliminating irregular boundaries and seams in image stitching, and enhancing the perceptual quality of the stitched images [
16]. However, there is still a need for further research on image stitching techniques that involve moving objects.
Current algorithms struggle with unstructured roads and frequently changing geographical environments, leading to instability in feature extraction and uneven exposure, which do not meet the requirements for high-precision digital maps. To address these issues, this paper proposes a two-dimensional digital map modeling method based on UAV aerial images, enhancing map accuracy. The innovations include:
To address the incomplete feature extraction of sequential images using SIFT and SURF algorithms, we propose a C-SURF algorithm to enhance feature detection efficiency. Moreover, we improve feature matching accuracy by optimizing the dimensionality of feature descriptors, thereby providing reliable data for sequential image stitching;
To mitigate ghosting and color artifacts in image stitching caused by moving objects, we propose the novel energy function that integrates pixel texture features. This approach aims to improve the visual quality of stitched sequential images by reducing these artifacts.
2. Feature Extraction and Registration Based on the C-SURF Algorithm
2.1. The C-SURF Algorithm
Feature-based image registration is a core component of image stitching, and the quality of feature point extraction significantly influences feature matching and image stitching. The production of digital maps imposes strict requirements on image quality. Commonly used feature point extraction algorithms, such as SIFT and speeded-up robust features (SURF), encounter challenges in detecting feature points in images that contain dynamic scenes, leading to the potential omission of certain elements.
The traditional SURF algorithm is an enhancement of the SIFT algorithm, resulting in similar operational processes. Initially, the SURF algorithm performs Hessian matrix analysis, followed by non-maximum suppression, while employing integral images to accelerate feature extraction. In generating feature descriptors, the SURF algorithm computes a 64-dimensional feature vector, which diminishes the accuracy of feature matching compared to the 128-dimensional feature vector utilized in the SIFT algorithm. Consequently, while the SURF algorithm retains the stability, robustness, and rotation invariance of the SIFT algorithm and enhances the real-time performance of feature extraction, it sacrifices the precision of feature registration.
To address this limitation, this paper proposes an algorithm that combines an improved Canny edge detection algorithm with an enhanced SURF algorithm, referred to as the C-SURF algorithm. The C-SURF algorithm not only improves the quality of image feature extraction but also increases the accuracy of feature matching. The specific implementation steps of the algorithm are as follows:
- (1)
The source image undergoes high-contrast denoising preprocessing to eliminate noise, ensuring that edge information in the source image is preserved and enhancing the stability of feature extraction and contour information;
- (2)
The improved Canny edge detection algorithm is employed to extract edge information from the source image;
- (3)
An improved SURF feature extraction algorithm based on logarithmic polar coordinates is utilized to perform feature detection on the image obtained in step two, ultimately resulting in a set of features from the image.
2.2. Improved Canny Edge Detection Algorithm in the C-SURF Algorithm
The Canny edge detection algorithm is an efficient and straightforward process with a small data footprint. However, when combined with Gaussian filtering during its application, it can weaken the edges, become overly sensitive to noise, and is prone to generating spurious edges. Therefore, before integrating it with the SURF algorithm, it is necessary to modify it by employing high-contrast filtering in place of Gaussian filtering. This modification aims to achieve noise reduction while effectively preserving edge information. The enhanced algorithm is divided into four steps: high-contrast image denoising, gradient calculation, feature extraction, and delayed edge tracking.
- (1)
High-Contrast Image Denoising
The expression of high-contrast filtering is shown as Equation (1):
where
represents the pixel value of the pixel point after H-B filtering,
represents the pixel value of the pixel at coordinate
in the source image,
represents the weight coefficient of the mask in the filtering process, and
is shown as Equation (2):
- (2)
Gradient calculation
By iteratively calculating the pixel gradient magnitude
G and the gradient direction
, the extrema of the target function are identified, as shown in Equation (3):
where
and
represent the first-order partial derivatives of the pixel gradient in the
x and
y directions, respectively.
- (3)
Feature Extraction
The feature extraction process commences with non-maximum suppression, which involves calculating the magnitude of the target pixel in the direction of the gradient to suppress non-maximum values. This is followed by a comparison with pixels within the local neighborhood; if the magnitude of the target pixel exceeds that of its surrounding pixels, it is retained, otherwise, it is discarded. By employing a double-threshold method for image segmentation, all pixel points within the target image are categorized into three classes based on the magnitude of their gradient values: strong edge points with gradient values exceeding the high threshold, weak edges with gradient values between the two thresholds, and pixel points with gradient values below the low threshold. The weak edges require further processing, as they encompass not only genuine edges but also spurious edges induced by noise and grayscale variations. Conversely, the suppressed pixel points no longer appear in the target image.
- (4)
Hysteresis Edge Tracking
During the process of non-maximum suppression, weak edges are identified. To filter out the true edges, it is considered that genuine edges are typically connected to strong edges. Therefore, a 3 × 3 rectangular window is employed to filter these weak edges. If a strong edge point is detected within this window, the pixel is retained as a true edge; otherwise, it is discarded as a spurious point. Continuous iterative tracking is utilized to confirm the true edges.
2.3. Improved SURF Feature Extraction in the C-SURF Algorithm
The traditional SURF algorithm frequently employs the concept of integral images and applies appropriate simplifications and approximations to the integrals within the Hessian matrix [
13]. However, the traditional SURF algorithm constructs descriptors by statistically extracting histograms and determining the principal orientation of features in the neighboring region. It then sums the responses in two directions along the principal orientation and the absolute sum of the responses within a 4 × 4 sub-region, ultimately describing the corresponding feature points with a 64-dimensional feature vector. Consequently, compared to the SIFT feature extraction algorithm, although the reduction in dimensionality of the SURF algorithm’s feature descriptors alleviates data redundancy, it also results in a loss of accuracy in feature registration.
In the C-SURF algorithm, the improved SURF algorithm primarily focuses on optimizing the descriptor. The descriptor in this method is designed to extract logarithmic polar coordinates from the neighborhoods of key points obtained at different scales, thereby enhancing the accuracy of image registration. More specifically, the newly obtained neighborhoods in logarithmic polar coordinates are first divided into regions, segmenting the neighborhood into three blocks based on radial distances of 6, 9, and 15, while excluding the coordinates of the central pixel. Additionally, the area is divided into eight regions, resulting in a total of 17 sub-regions. Each sub-region calculates the gradient histogram in eight directions, ultimately yielding a 136-dimensional feature description. The specific construction process is illustrated in
Figure 1.
To achieve rotation invariance, the logarithmic polar coordinates must capture the direction of the keypoints. After rotation, the new coordinates of the sampling points in the neighborhood of the central pixel point are as shown in Equation (4):
In terms of assigning pixel weights around keypoints, both the SIFT feature extraction algorithm and the traditional SURF feature extraction algorithm employ a Gaussian weighting function. The Gaussian weighting function assigns larger weights to pixels closer to the center pixel and smaller weights to those farther away, thereby reducing the influence of distant pixels on the central pixel. However, during nearest-neighbor matching, Gaussian weighting tends to blur the distinction between the nearest and the second-nearest neighbors, which can easily lead to mismatches and significantly increase the computational burden for improving registration accuracy. In contrast, distance-based weighting amplifies the difference between the nearest and the second-nearest neighbors during nearest neighbor matching, which is beneficial for enhancing the registration accuracy of feature points. To address the deficiencies associated with both Gaussian and distance-based weighting schemes in registration, this paper adopts a method that combines Gaussian and distance weights to improve the accuracy of subsequent feature point registration. The new weight calculation formula is presented in Equation (5):
where
represents the Euclidean distance between a neighboring pixel and the central pixel,
denotes the coordinates of the neighboring pixel, and
signifies the coordinates of the central pixel.
3. Image Stitching Based on the Novel Energy Function
Feature matching using the improved C-SURF algorithm yields relatively ideal results. For multiple images, image stitching is also required. Conventional image stitching methods abound, such as those based on trigonometric function weights, pixel-weighted averaging, and direct averaging methods. However, these methods often fall short, primarily because they struggle to achieve seamless stitching of color images, and when there are moving objects in the images to be stitched, the resulting effect is prone to ghosting and other issues [
14]. Therefore, an optimal seam-finding method is needed to address discontinuities and ghosting in the orthoimage sequences obtained by UAVs.
This paper transforms the problem of finding the best seam into a pixel labeling problem. By constructing an energy function based on the energy spectrum, smoothness terms, and problem-specific factors, a novel minimum cut and maximum flow connected graph is derived, ultimately generating the optimal seam. Under the consideration of the energy spectrum and the smoothness term, the texture features of neighboring pixels with the same label are introduced as an expansion factor, and the novel energy function formula is shown in Equation (6).
A crucial step in the minimum cut and maximum flow algorithm is the identification of a connected graph. As depicted in
Figure 2, assuming the source node is S and the sink node is T (hereinafter referred to as the S-T connected graph), the process involves the following: Pixels within the overlapping region correspond to nodes in the connected graph, with the nodes comprising a set of vertices
V and a set of edges
E. The network flow between the source and sink nodes traverses through various nodes within the overlapping region, assigning different weights based on distinct energy spectra. In the graph, the weights of the connections for network t are derived from the data term
, for network n from the smoothness term
, and for network m from the texture feature
. Within the graph, the thickness of the network lines represents the magnitude of the weights; it is assumed that the closer the pixel values of two adjacent pixels p and q are to each other, the greater the weight assigned, indicating that pixel points p and q are likely from the same image. Conversely, if the pixel values are dissimilar, the two pixels may originate from two different source images. Thus, the S-T connected graph based on the overlapping region’s pixel points between different images is essentially formed.
The calculation formula for texture features is shown in Equation (7).
Here, λ represents the adjustment parameter, which is set to a default value of 1 in the experiments; denotes the pixel points within the overlapping region of the two images; and represents the feature spectrum within the central 3 × 3 neighborhood.
The result of reassigning labels is obtained by optimizing the
S-T connected graph using the min-cut and max-flow algorithms. Set
to a specified value, that is, divide the pixel labels into three categories: pixel points in the overlapping area are in the source image
I1 but not in the source image
I2; pixel points in the overlapping area are not in the source image
I1 but are in the source image
I2; pixel points in the overlapping area are in both the source image
I1 and the source image
I2. The expression is as follows in Equation (8):
In this context, denotes the label of the pixel, and represents the coordinates of the pixel.
The
smoothness term, which defines the discontinuity between adjacent pixels within a four-neighborhood
N, plays a crucial role in the entire energy function definition. It represents the difference between adjacent pixels in the overlapping region and directly influences the quality of the entire image after stitching. However, in traditional energy functions, to optimize the regularization term of the objective function and enhance the model’s generalization ability, as well as to simplify the fitting scenario, the smoothness term is often based on the
L2 norm. The expression is as follows in Equation (9):
A well-designed seam should ideally traverse through the moving objects within the overlapping region to minimize the creation of artifacts, yet the smoothness term based on the L2 norm fails to adequately distinguish between misaligned areas within the overlap. This is because the L2 norm-based smoothness term does not provide an appropriate penalty when dealing with moving objects within the overlap. To mitigate this issue, this paper reconsiders the choice of norm for the smoothness term.
In order for the novel energy function to more effectively differentiate between aligned and misaligned regions of moving objects within the overlap, it is necessary to find a suitable function that allows the seam to better avoid these objects. The goal is to maximize the penalty of the smoothness term in misaligned regions and minimize it in aligned regions, thereby distinguishing between them. Assuming that pixels with the same label in source images
I1 and
I2 are denoted as
x, the aim is to identify differences in the same moving objects across the overlapping regions of the two source images as much as possible to effectively avoid them. In this chapter, a new function is defined based on the
L1 norm, as shown in Equation (10):
The expression for the new smoothness term function is as follows in Equation (11):
The specific procedure of the algorithm is detailed in Algorithm 1. Initially, feature extraction and matching are performed on the images to calculate a global perspective transformation matrix
H. Based on matrix
H, the overlapping region
R and boundaries of the two source images are identified. Subsequently, the data term, smoothness term, and texture term of the novel energy function are computed. The minimum cut and maximum flow algorithm is then applied to assign labels to pixels within the overlapping region, thereby identifying the optimal seam. Finally, Poisson fusion is used for color correction of the stitched image, resulting in an ideal panorama. When seeking the optimal seam, the search begins with pixels at the first row of each column within the overlapping region’s boundary. The energy function is calculated for each starting point, and then the search is expanded downward according to the energy formula, computing the energy values of the three adjacent pixels in the next row within the same column and summing them with the energy values of the starting expansion point from the previous row. This process continues until the last row is reached. Ultimately, the line formed by connecting the pixels with the minimum energy values is selected as the optimal stitching line.
Algorithm 1: Image Stitching Algorithm Based on the Novel Energy Function |
Input: Source Image I1 and Source Image I2. |
Output: Panoramic Image I. |
1: Perform feature extraction on the source images using the C-SURF algorithm. |
2: Conduct a coarse matching of the extracted features using a brute-force approach and refine the matches using the RANSAC algorithm. |
3: Perform bundle adjustment on the images and calculate the global perspective transformation matrix H. |
4: Determine the overlapping region R based on matrix H and identify the boundaries of region R. |
5: according to the Equation (8). |
6: according to the Equation (11). |
7: according to the Equation (7). |
8: into the novel energy function for computation, as shown in the Equation (6). |
9: Solve the energy function equation using the minimum cut and maximum flow algorithm, and assign labels to pixels within the overlapping region. |
10: Obtain the stitched panoramic image using Poisson fusion. |
In summary, the specific process of the proposed two-dimensional digital map modeling method based on UAV aerial images is illustrated in
Figure 3.
5. Conclusions
This paper presents a method for creating digital maps from urban building sequence images captured by unmanned aerial vehicles (UAVs), which can be used for emergency rescue, accident handling, and other issues. During the image processing, the following problems were addressed and certain achievements were made:
- (1)
In the domain of image feature extraction, the C-SURF algorithm is proposed, which demonstrates a more comprehensive feature detection capability compared to the SIFT and SURF algorithms. When detecting features in the same image, the number of detected feature points using the C-SURF algorithm exceeds that of the SIFT algorithm by nearly 40% and that of the SURF algorithm by nearly 48%. This advancement mitigates the issue of certain elements within the image being undetectable as feature points and increases the number of correctly matched point pairs within the same image, thereby enhancing the effectiveness of feature registration;
- (2)
To address the issue of defects in stitched images caused by the presence of moving objects in the analyzed scene, a novel image stitching method based on the novel energy function is proposed. This method not only prevents the occurrence of uneven exposure when stitching two or more images but also significantly improves the issues of color discrepancies and ghosting artifacts that arise after image fusion. The average signal-to-noise ratio (SNR) of the stitched images increased from the original 12.6617 dB to 36.1661 dB, indicating a marked improvement in image quality.
Furthermore, during the research process of obtaining panoramic images and target extraction, there remain many areas for in-depth study. For instance, optimization of drone aerial image processing techniques under varying lighting conditions and different terrain complexities could be considered. The maps generated in this study do not take geographic coordinate information into account; thus, adding latitude, longitude, and altitude to the current maps to achieve more precise mapping is one of the directions for future research. Additionally, exploring the integration of the generated maps with GIS environments presents another potential avenue for development.