You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

Published: 24 June 2024

An Enhanced Aircraft Carrier Runway Detection Method Based on Image Dehazing

,
,
,
,
and
1
College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
2
Department of Instrument Science and Opto-Electronics Engineering, Beihang University, Beijing 100191, China
*
Author to whom correspondence should be addressed.
This article belongs to the Collection Advances in Automation and Robotics

Abstract

Carrier-based Unmanned Aerial Vehicle (CUAV) landing is an extremely critical link in the overall chain of CUAV operations on ships. Vision-based landing location methods have advantages such as low cost and high accuracy. However, when an aircraft carrier is at sea, it may encounter complex weather conditions such as haze, which could lead to vision-based landing failures. This paper proposes a runway line recognition and localization method based on haze removal enhancement to solve this problem. Firstly, a haze removal algorithm using a multi-mechanism, multi-architecture network model is introduced. Compared with traditional algorithms, the proposed model not only consumes less GPU memory but also achieves superior image restoration results. Based on this, We employed the random sample consensus method to reduce the error in runway line localization. Additionally, extensive experiments conducted in the Airsim simulation environment have shown that our pipeline effectively addresses the issue of decreased detection accuracy of runway line detection algorithms in haze maritime conditions, improving the runway line localization accuracy by approximately 85%.

1. Introduction

Due to the needs of modern national security and interests development, aircraft carriers, as mobile platforms for aircraft takeoff and landing at sea, play a crucial role. Meanwhile, with the advancement of technology, whether in civilian or military scenarios, an increasing number of hardware devices are trending towards unmanned development. A CUAV deployed on ships, e.g., aircraft carriers and destroyers, is gradually playing vital roles in various aspects such as maritime rescue, reconnaissance, surveillance, target engagement, etc. Therefore, as a crucial aspect of the entire mission execution of shipborne CUAVs, CUAV landing has gradually become a popular research field in recent years. Typically, CUAV landing methods mostly rely on sensors such as vision, altitude, and laser, with vision sensors gradually becoming the most commonly used sensor type in this neighborhood due to their low cost and high accuracy []. One of the most crucial subtasks in landing missions is runway line detection and positioning.
For runway detection, there are in convention two kinds of methods, which are the traditional image processing methods [,] and deep learning methods [,]. Traditional image processing methods typically rely on image edge detection and often require the assumption of multiple ideal prior conditions, making them unsuitable for practical runway line detection. With the rapid advancements in neural networks over the past few years, runway detection algorithms for unmanned devices increasingly utilize convolutional neural network (CNN) algorithms. These algorithms [,,,] can achieve a well-generalized model by feeding a large amount of data into the network structure for forward propagation and then fine-tuning the model’s parameters through backpropagation using a loss function. In practical applications, these models are typically pruned and quantized before deployment onto hardware devices. They utilize camera data for inference to obtain reference results, enabling full process automation without human intervention in controlling device actions. Although these methods have achieved satisfactory runway detection results, carrier operations at sea often encounter adverse weather conditions such as rain and haze. This leads to degradation in the image data acquired by visual sensors, which inevitably affects the precision of CUAV runway detection. Undoubtedly, this increases the complexity and risk during CUAV landing missions.
In recent years, there has been a surge in work [,] on outdoor hazed image restoration. However, these methods often focus solely on the quality of image restoration without considering the feasibility of their practical application scenarios. This has led to an excessive number of model parameters and increased computational time for these methods.
This paper focuses on the problem of degradation in runway line recognition due to image sensor degradation in haze conditions and proposes a runway line recognition system based on image dehazing. In summary, the contributions of our paper can be mainly summarized as follows:
  • A lightweight dehazing network that combines CNNs with Transformers structure is introduced. An effective complementary attention module and a novel Transformer of linear computation complexity are designed to enhance the network’s capability in image restoration;
  • By analyzing the positions of pixels obtained from the runway detection algorithm in two-dimensional space, adjusting the landing process of CUAVs by combining sensor data, performing multiple spatial transformations, and localization the spatial straight-line expressions of runway lines, a runway line localization method is proposed;
  • A system framework is designed to address the issue of CUAVs experiencing difficulty in safely and reliably landing in hazed conditions at sea. In the Airsim simulation environment, we conduct extensive experiments. The result shows our system can effectively restore image content details and yield runway lines through localization with a deviation from ground truth of less than 2°.
In the remainder of the paper, we first discuss the related works in Section 2, followed by a detailed exposition of the methods and ideas adopted in the paper in Section 3. Subsequently, in Section 4, we conduct extensive experiments and analyze data to validate the effectiveness and feasibility of the proposed methods. Finally, in Section 5, we summarize the work conducted in the paper.

3. Materials and Methods

To enhance the airborne landing safety of carrier-based unmanned aerial vehicles in maritime hazed conditions, we have devised the method pipeline as illustrated in Figure 1. This pipeline primarily comprises three segments: image dehazing, runway line detection, and runway line localization, followed by the assessment of runway line localization error. Therefore, for a coherent exposition of our approach, this section predominantly expounds in three parts. Firstly, we provide a detailed analysis and explanation of the dehazing model architecture designed in this paper, along with each module. Secondly, we derive and elucidate the formulas for the runway localization method and error analysis adopted in this paper. In the end, we supplement the description of the environmental construction steps and data collection methods for the dataset used in this paper.
Figure 1. The overall pipeline framework.

3.1. Dehaze Implementation

In this part, we first provide a detailed explanation of the overall workflow of the network, followed by descriptions of the structures and functions of each basic module used in the network.

3.1.1. Network Overall

The overall framework of the network, as depicted in Figure 2, comprises a U-shaped hierarchical structure consisting of symmetric encoders and decoders. The encoder is composed of down-sampling modules, MixAttention, and Mobilevit modules, while the decoder consists of up-sampling modules, MixAttention, and SKFusion [] modules. Each level of the encoder and decoder has a skip connection. The SKFusion module fuses the input of each layer of the decoder with the output of the skip connection through the SKFusion module and passes it to the next layer. With each layer of down-sampling modules, the channel number of the feature map doubles while the spatial dimension halves. Conversely, with each layer of up-sampling modules, the channel number of the feature map halves while the spatial dimension doubles. For a hazed image I R H × W × 3 , it first passes through an introduction module to obtain low-dimensional feature information F s R H × W × C . Then, it undergoes a bottom-up encoding process to gradually obtain deep-level high-dimensional features F d R H 4 × W 4 × 4 C . Subsequently, it passes through a top-down decoder and a convolutional operation with a kernel size of 3 × 3 to obtain residual features F r e s R H × W × 4 . Finally, F r e s is split into two feature maps, K and B, with different dimensionalities. Then, analogized by the classic Atmospheric Scattering Model theory, passing through element-wise multiplication between K R H × W × 1 and B R H × W × 3 to reconstruct the dehazed image I ^ , i.e., I ^ = K I B + I , ⊙ means the element-wise multiplication. For downsample operation and upsample operation, we implement convolution with a kernel size of 2 and a stride of 2 and Pixelshuffle PixelShuffle with an upsampling factor of 2, respectively.
Figure 2. Network architecture of the proposed model. The model is a modified 5-stage U-shape Net. The input shape is H ×W × C and the feature maps in each stage are shown below every block.

3.1.2. MixAttention Module

In hazed images, compared to clears, there are primarily two differences: decreased image contrast and loss of edge and texture information. To address these issues, we believe that appropriate attention modules should be employed to enable the network to selectively focus on specific information in the image. Inspired by recent advances in attention mechanisms in machine learning, we propose the MixAttention module. The module integrates both simple spatial attention (SPA), simple channel attention (SCA) mechanisms and feed-forward network (FFN), and fuses the feature information from these two attention modules through a linear mapping layer. The simple spatial attention mechanism primarily calculates cross-pixel attention weights for multi-dimensional features, emphasizing edges, textures, and dense blurry regions within the image. Meanwhile, the channel attention mechanism computes cross-channel attention weights for multi-dimensional features, focusing on color distribution characteristics within the image. The input of the MixAttention module undergoes Layer Normalization followed by respective processing through pixel-wise convolutional layers and depth-wise convolutional layers. Subsequently, after non-linear activation, the features are passed through SCA and SPA to obtain cross-channel and cross-pixel fusion feature information. After attention fusion, the FFN increases the dimensionality of the feature maps obtained through attention operations, applies non-linear activation functions, and then reduces the dimensionality to enhance feature representation capacity. The MixAttention model is shown in Figure 3.
Figure 3. The structure of MixAttention architecture. In the attention fusion part, it consists of two different attention mechanisms: SPA and SCA.
The implementation details of SPA and SCA can be found in Figure 4. From the figure, it can be observed that the attention computation of SPA and SCA are both divided into two branches. After element-wise multiplication, attention-weighted calculation output is obtained. The difference lies in the fact that when the feature map F n o n R H × W × C undergoes the non-linear branch of SCA, the shape of the feature map changes to R C , in contrast, SPA does not reshape. The SPA and SCA can be individually formulated as:
F s p a [ h , w , k ] = σ ( c = 1 C w ( c ) F n o n [ : , : , c + k ] ) × i = 1 H j = 1 W c = 1 C w [ i , j , c ] F n o n [ h + i , w + j , c + k ] ,
F s c a [ h , w , k ] = σ ( c = 1 C w ( c ) P ( F n o n [ : , : , c + k ] ) ) × F n o n [ h , w , k ] ,
where σ denotes the Sigmoid activation function, × means the element-wise multiplication operation, w ( c ) and w [ i , j , c ] are the attention weights at point [ : , : , c ] or [ i , j , c ] of the channel and spatial attention mechanisms, respectively, P represents the global average pooling operation, [ h , w , k ] is the pixel point coordinate of the feature map, and F n o n is the input feature map.
Figure 4. Implementation details of the SPA and SCA module.

3.1.3. Mobilevitv3 Module

We introduce the Mobilevitv3 to enhance the feature representation of the image. The module comprises local block, global block and fusion block. The local block functions as a pre-processing stage to extract the local representation, and is composed of a depth-wise separable convolutional layer and a point-wise convolutional layer. The global block performs as a central unit through multiple Linear transformers learning the long-range dependence between pixels as well as global representation and is composed of a stack of transformers based on separable self-attention, a linear mapping layer. The fusion block is a linear mapping layer composed of a convolutional operation with a kernel size of 1 × 1 . The input of the Mobilevitv3 module respective processes through the local representation block and global representation block. Subsequently, after non-linear activation, the features are passed through the fusion block to obtain the final feature information. The Mobilevitv3 model is shown in Figure 5. The self-attention mechanism in the Linear Transformer module differs from the traditional multi-head self-attention mechanism in Transformers. Instead, the module employs linear separable self-attention (SSA). This attention mechanism replaces scaled dot-product attention between tokens with element-wise product between tokens, reducing the computational complexity from quadratic in terms of the number of tokens to linear. The computational process of SSA can be seen in Figure 6.
Figure 5. The structure of mobilevitv3 model.
Figure 6. Calculation process of SSA.
First, the input tokens f R C × P × N , C means the size of channels, P shows the number of tokens and N expresses the dimension of each token, are normalized by layer normalization, where the standard deviation and mean of the features of each sample are separately normalized to 1 and 0. Then they are subjected to linear mapping and divided into three unequal branches: input: I R 1 × P × N , key: K R d × P × N , and value: V R d × P × N . The I branch is processed by a softmax operation to obtain content scores c s , which are then element-wise multiplied by K. After summation, the content vector is obtained, which is element-wise multiplied by V. This result then undergoes another linear mapping and is output to the feedforward unit for nonlinear processing. The process can be depicted as:
c s = softmax ( I ) , c v = i = 1 d c s × K , SSA = H l i n e a r ( c v ) × V ,
where d is the number of channels of K and V, H l i n e a r represents the linear mapping layer, and × means element-wise product. The output of the SSA module is the final feature representation of the input tokens.

3.2. Runway Localization

After the hazing removal algorithm processes the image, the UFLD algorithm infers runway points on the image. Due to various factors, it is necessary to filter and fit these detection points to accurately determine the positions of the detected left and right runway lines in the three-dimensional coordinate system. Furthermore, during the process of localization the runway lines, we utilized pixel depth information, which we obtained through two methods. When the CUAV is at a considerable distance from the carrier, accurate depth values cannot be obtained from the depth camera. Considering that there is an angle between the CUAV’s heading direction and the normal vector of the sea surface, we can approximate the depth information to achieve coarse positioning of the runway lines. On the contrary, when the CUAV is closer to the carrier, the depth camera can accurately capture depth values, enabling precise positioning of the runway lines directly.
Firstly, the runway points extracted by UFLD can be seen in Figure 7. Due to the accuracy of the runway detection algorithm and the uncertainty of parameters when manually annotating runway lines in images, points inferred by the network may deviate from the extension direction of the runway lines. As illustrated in Figure 7, the first three-runway points on the right side exhibit significant errors. Therefore, in the process of localization runway lines, it is necessary to eliminate runway points that cause large errors. Handling this problem in three-dimensional space is complex, thus necessitating the mapping of the runway to a two-dimensional space. Additionally, since points on runway lines should have the same height value, arithmetic operations can be performed in the pixel coordinate system. We can initially employ the Random Sample Consensus (RANSAC) method in the pixel coordinate system to remove outliers, facilitating the subsequent line-localization step in the global coordinate system. The process can be described as follows: Assuming there are i points in total, based on the principle of RANSAC, we can randomly select k points to perform the least squares localization (LSF) of a line. We use the distance from each point to the line as the error term and set a minimum error range. We then count the number of points falling within this range. If the number of points meets the criteria, we can consider the weights of the line appropriate. Then, we update the minimum error and recalculate until the minimum error no longer decreases.
Figure 7. Runway points detection result.
Secondly, after removing outliers, we need to map each point to three-dimensional space to fit the position of the runway line in the global coordinate system, achieving runway detection and localization. Pondering the trajectory of a CUAV from the starting point to above the flight deck, we can simplify this process as illustrated in Figure 8. Assuming that at the initial time instance, the position of the CUAV is p b 0 with the attitude R b 0 , and when flying above the flight deck, the CUAV detects a point on the runway line at pixel coordinates p p i x i . At the moment, the position of the CUAV going to p b 1 with the attitude R b 1 , the coordinates of this point in the body coordinate system relative to the initial time instance p g i can be written as:
p c a m i = d e p t h · f x 0 c x 0 f y c y 0 0 1 1 · p p i x i , i = 1 , 2 , 3 , . . . p b o d y i = R c b · p c + p b c , p g i = R b 1 · ( R b 0 ) 1 · p b + p b 1 p b 0 .
Figure 8. Simulation of carrier-based unmanned aerial vehicle landing task process in our simulated environment.
p c a m i denotes the coordinates of pixel point p p i x i in the camera coordinate system, where i represents any point on the runway line, and p b o d y i represents the coordinates of this point in the body coordinate system. Additionally, f x , f y , c x , c y is the camera intrinsic parameters, R c b and p b c are the rotation matrix and translation matrix of the camera coordinate system relative to the body coordinate system, respectively. The above process is repeated for each point on the runway line to obtain the global coordinates of the runway line in the global coordinate system. Assuming the line equation is f ( x 1 , x 2 , x 3 ) = w 0 + w 1 x 1 + w 2 x 2 + w 3 x 3 and total pick out k points, the localization runway line equation can be expressed as:
w T = ( w 0 , w 1 , w 2 , w 3 ) , x = ( 1 , x 1 , x 2 , x 3 ) , L ( w , x ) = i = 1 k ( y i w T x i ) 2 .
Our purpose is to minimize the loss function L ( w , x ) since we can employ the method of stochastic gradient descent to obtain the weight coefficients that minimize the loss function:
w t = w t 1 η L w ,
where η is the step size, w t is the weight coefficient at step t, and w t 1 is the weight coefficient at step t 1 . The process is repeated until the loss function converges to a minimum value.
Finally, after performing the above steps, we can obtain the equations of the accurate straight lines for the left ( f l ( x l , y l , z l ) = w l 0 + w l 1 x l + w l 2 y l + w l 3 z l ) and right ( f r ( x r , y r , z r ) = w r 0 + w r 1 x r + w r 2 y r + w r 3 z r ) runway. The runway lines should be parallel ( w l 0 = w r 0 , w l 1 = w r 1 , w l 2 = w r 2 , w l 3 = w r 3 ). Therefore, the equation of the estimated central axis line can be expressed as f c ( x c , y c , z c ) = ( w r 0 + w l 0 ) 2 + w r 1 x c + w r 2 y c + w r 3 z c . Hence, for the CUAV to align with the central axis of the runway, it is necessary to control the CUAV’s forward normal vector to align with the direction of the central axis line, while the CUAV’s center of mass should be as close as possible to the equation of the central axis line.

3.3. Dataset

The dataset was generated by Airsim (ver.1.8.1) [], an open-source cross-platform simulator developed based on the Unreal Engine (ver.4.27). The excellent visual rendering power of Unreal Engine enables the simulator to produce high-quality visual simulation effects, making it suitable for a wide range of applications or research areas related to machine learning and artificial intelligence, such as deep learning, computer vision and automated visual data collection. The overall simulation utilizes the aircraft carrier Ronald Reagan (CVN-76) and the shipboard fighter J-15 to construct a landing scenario within an ocean environment. The carrier-based fighter took off from the deck of the aircraft carrier, flew to a distance of about 10 km from the carrier to enter the holding track, and then began to return, entering the landing track at a distance of about 5 km from the carrier, lowering its altitude to 300 m, lowering its altitude to 60 m at a distance of 800 m from the carrier, and finally gliding down to land on the carrier’s runway. Throughout the landing process, the center of the camera was always pointed at the runway on the deck of the carrier from the moment the aircraft entered the landing pattern. The ground truth of the trajectory was recorded by Airsim.
First, the geometric model of camera imaging needs to be established, that is, to obtain the camera’s internal parameter matrix K. The internal parameter matrix K can be expressed approximately as:
K = f / s 0 w / 2 0 f / s h / 2 0 0 0
where f represents the focal length, s represents the pixel size, and w and h represent the resolution of the image.
The camera in AirSim uses a perspective projection, then the focal length f and the pixel size s can be expressed as:
f = 1 t a n ( f o v / 2 ) s = 2 w
Consequently, in the AirSim simulation, the internal parameter matrix K of the camera can be expressed as:
K = w 2 t a n ( f o v / 2 ) 0 w / 2 0 w 2 t a n ( f o v / 2 ) h / 2 0 0 0
The parameters of the camera geometric model obtained through setup and computation, such as the camera internal reference matrix K, distortion sparse and field of view are summarised in Table 3.
Table 3. Parameters of camera model.
Next, the attitude change from the camera coordinate frame to the world coordinate frame still needs to be solved, that is, solving for the camera’s external parameter. We set a point directly in front of the deck of Ronald Reagan (CVN-76) as the origin of the world coordinate frame, and using the North-East-Down (NED) coordinate frame and right-handedness, the carrier’s coordinates in the world coordinate frame are (−148.45, 47.2, 31.26). The body coordinate frame uses the Forward Right Down (FRD) coordinate frame and right-handedness, which means that the airplane body is forward as the x-axis, perpendicular to the x-axis to the right as the y-axis direction, and perpendicular to the x-axis and y-axis downward as the z-axis, as shown in the Figure 9, which are the side view, front view, and top view in sequence. As shown in the Figure 10, the area surrounded by the red line is defined as the runway area.
Figure 9. Definition of body coordinate and three views of the J-15. (a) The side view of J-15. (b) The front view of J-15. (c) The top view of J-15.
Figure 10. Runway area of Ronald Reagan.
The final dataset acquired in the Airsim simulation consisted of a visible image with a depth map at the exact same camera position, using Airsim’s recording feature to continuously acquire the image and ground truth values.

4. Experiment Results and Discussions

In this section, to validate the optimization effect of our algorithm for the challenging task of carrier-based unmanned aerial vehicles landing on a ship deck in hazing conditions, we conducted numerous experiments in a simulated maritime environment. We separately verified the dehazing effect of the proposed algorithm in simulated hazeing maritime environments and evaluated the performance of runway detection and adjustment before and after dehazing, quantifying the errors.

4.1. Dehazing Experiment

In this part, we first introduce the parameter settings and data preparation for the dehazing experiment. Then, we present the performance evaluation metrics and results of the dehazing experiment.

4.1.1. Parameter Settings

To accelerate the training speed of the model and obtain better generalization performance, we use the AdamW optimizer to train the model. The optimizer parameters β 1 = 0.9 and β 2 = 0.999 are set using the currently popular parameter settings. Considering the high total number of iterations used in our training, we set the weight decay parameter of the optimizer to 4 × 10 3 . The initial learning rate is set to 5 × 10 4 , and the learning rate is reduced by cosine annealing strategy gradually descending to 1 × 10 7 within 600 K iterations. The batch size is set to 16 and the input image size is split to 512 × 512 . The loss function is the Peak Signal to Noise Ratio loss(PSNRLoss) [] function. The training process is conducted on a single NVIDIA RTX 4090 GPU with a 13th Gen Intel(R) Core(TM) i9-13900KF CPU. The network framework is implemented using PyTorch 1.11.0, CUDA 11.3 and CUDNN 8.6.2.
In the simulation environment, to obtain the dataset for training the model, we control the CUAV to fly near the deck position and capture 600 pairs of clear and hazy images using the onboard camera. Among these, we selected 421 pairs as the training set and the remaining 179 pairs as the test set. Each image in the dataset has a size of 1280 × 720 pixels. To reduce GPU memory consumption during training, we divided each image into eight equally sized sub-images of 512 × 512 pixels, with overlapping regions between adjacent sub-images. Through this operation, the number of image pairs in both the training and test sets is increased to 3368 and 1512 pairs, respectively. Additionally, we employed image augmentation techniques such as horizontal flipping and vertical symmetry to enhance the diversity of the dataset and improve the generalization ability of the trained model. The metrics including Peak Signal to Noise Ratio (PSNR), Structural Similarity (SSIM) [], and loss functions during the training process are shown in Figure 11a, Figure 11b and Figure 11c, respectively.
Figure 11. Reference values of various metrics during dehazing model training.

4.1.2. Performance Evaluation

The evaluation metrics applied in the dehazing experiment adopt PSNR and SSIM Index. The PSNR metric, which can quantify the reconstruction quality of the restored image, is based on the Mean Squares Error (MSE) between the ground truth image y and the restored image x. For an 8-bit RGB image, the formula for PSNR is as follows:
M S E ( x , y ) = 1 H × W i = 1 H j = 1 W y i , j x i , j 2 2 , P S N R = 10 · log 10 255 2 M S E ( x , y ) ,
where H and W are the height and width of the image, y i , j and x i , j are the pixel values of the ground truth image and the restored image. On the flip side, the SSIM metric can approximate the luminance, contrast and structure of the restored image compared to the ground truth image. The formula for SSIM is as follows:
S S I M ( x , y ) = ( 2 μ x μ y + C 1 ) ( 2 σ x y + C 2 ) ( μ x 2 + μ y 2 + C 1 ) ( σ x 2 + σ y 2 + C 2 ) .
where μ x , μ y are the mean values of the restored image and the ground truth image, σ x 2 , σ y 2 are the variances of the restored image and the ground truth image, σ x y is the covariance of the restored image and the ground truth image, C 1 and C 2 are constants to stabilize the division with weak denominator. In addition to the PSNRLoss, it can be calculated as:
P S N R L o s s = P S N R ( x , y ) .
To evaluate the dehazing performance of our algorithm, we compared it with five other state-of-the-art (SOTA) algorithms based on four quantitative metrics: PSNR, SSIM, model parameters, and multiply-accumulate operations (MACs). The quantitative evaluation comparison is shown in Table 4.
Table 4. Comparison of various models on dehazing performance metrics, model parameters, memory consumption and latency.
In Table 4 of the quantitative metrics, our model achieved the best image quality indicators and minimized model parameters and memory consumption. Specifically, our model achieved a PSNR of 30.18 dB and an SSIM of 0.972, comparing DehazeFormer [] with reducing 29.44 % #Param and 26.27 % memory computational cost as well as increasing 1.07 points in PSNR. The qualitative comparison of the dehazing effect is shown in Figure 12 and Figure 13. The results show that our model can effectively remove the haze in the image and restore the clear image, with the best visual effect among the six algorithms.
Figure 12. Visual comparison of dehazing effects of different models on sequence 1.
Figure 13. Visual comparison of dehazing effects of different models on sequence 2.
In addition, we conduct ablation experiments to verify the effectiveness of the network modules used in this study and the impact of the image augmentation methods used during training on the training metrics of the network model. In the ablation experiment, we replaced the MixAttention and the Mobilevitv3 module used in this paper with conventional depthwise separable convolutions and the transformer module used in [], respectively, while also using no image segmentation and image augmentation operations on the dataset as the baseline. The result is shown in Table 5.
Table 5. Results of ablation experiments.
According to the ablation experiment result of Table 5, after replacing conventional depthwise separable convolutions with the MixAttention module, the PSNR increased by 0.52 dB. Subsequently, replacing the transformer module with the Mobilevitv3 used in this paper led to a further increase in PSNR by 0.64 dB. Finally, applying data augmentation to the dataset resulted in a PSNR improvement of 0.08 dB. The ablation experiment results demonstrate the effectiveness of the network modules used in this paper and the impact of the image augmentation method on the training metrics of the network model.

4.2. Runway Localization Analysis

In this part, we primarily elaborate on two aspects. Firstly, we compare and analyze the runway detection performance before and after dehazing. Subsequently, we quantify the errors of the localization equations for the left and right runway lines.

4.2.1. Dehazed Function

To demonstrate the necessity of image dehazing, we selected two hazed images as the input images and observed the runway detection performance before and after dehazing. The results are shown in Figure 14. From Figure 14, we can observe that in a hazed image I 1 compared to the dehazed images R 1 , the UFLD algorithm detects fewer points in the hazed images, and the extension directions of these points deviate significantly from the direction of the ground truth (GT) runway lines. These two defects could lead to large errors in the localization runway line equations, which are undoubtedly fatal for the landing process of the CUAV. This is mainly attributed to the fact that in hazed images, the average pixel value of the image rises while the variance is reduced due to the degradation of the image. As a result, some of the edge information in the image is lost, and the detection algorithm cannot effectively detect the edges of the runway lines, leading to a decrease in detection performance. Dehazing operations mitigate this phenomenon, thereby enhancing the performance of runway line detection algorithms.
Figure 14. Results of runway detection algorithms before and after dehazing.

4.2.2. Localization Details

Indeed, taking the left runway line as an example (e.g., Figure 15a), each runway line consists of two white lines and several yellow lines intersecting with the white lines. For clarity of representation, we simplify the two white lines as two green lines and depict the centerline using a red line. The predicted runway is painted by the blue line.
Figure 15. The predicted result by UFLD.
In Figure 15b, we utilize the UFLD algorithm to detect the centerline of the left and right runway. Considering that the deck of aircraft carriers and similar vessels is a flat surface, the height values of the detected points should be close to each other. Based on this, we reduce the multidimensional linear regression problem to a two-dimensional information regression problem. Hence, the predicted centerline can be written to:
f ( x , y , z ) = w 0 + w 1 x + w 2 y ,   z = w 3 = h e i g h t .
where w 0 , w 1 , w 2 are the weight coefficients of the centerline equation. z = w 3 = h e i g h t is the height value of the detected point. Consequently, slope k = w 1 w 2 and intercept b = w 0 w 2 can be calculated.
To quantify the impact of dehazing on runway detection accuracy, we used the UFLD algorithm as the baseline and compared it with two other conventional runway line detection algorithms. We conducted ten experiments in total, calculating the average slope k, intercept b, and height h of the localization lines. The angle difference Δ θ and intercept difference Δ b between the localization runway lines and the ground truth runway lines are used as evaluation metrics. The experimental results are shown in Table 6.
Table 6. Comparison of localization results for each algorithm’s performance.
After our calculations in the simulation environment, the actual equations for the centerlines of the left and right runway should be:
f l e f t ( x , y ) = 0.16300 x + y 35.788523 ,   z = 6.0800000 , f r i g h t ( x , y , z ) = 0.15632 x + y 20.889669 ,   z = 6.0800000 .
From Table 6, it can be seen that the four listed methods show little error in the height direction compared to the ground truth, but the slope and intercept values show significant differences for the UFLD [], Lanedet [], and SAD [] algorithms. For example, the slope and intercept of the left runway line localization by the UFLD algorithm are 0.15 and 9.25 m less than the ground truth, respectively, with an angle difference of 8.5456° between the localization and ground truth left runway lines. The slope and intercept of the right runway line localization by the UFLD algorithm are approximately 0.07 and 5.33 m less than the ground truth, respectively, with an angle difference of 3.9949° between the localization and ground truth right runway lines. After enhancement by our algorithm, the differences between the slope and intercept of the left runway line and the ground truth were reduced to 0.02489 and 1.85 m, respectively, with the angle difference reduced to 1.3347°. For the right runway line, the differences between the slope and intercept and the ground truth were reduced to 0.00122 and 0.051 m, respectively, with the angle difference reduced to 0.0559°. It can be observed that the runway detection accuracy after dehazing has improved by approximately 86%. In summary, our entire algorithm pipeline can effectively improve the landing accuracy of CUAVs under hazed conditions, thus enhancing the safety of the landing process.

5. Conclusions

In this paper, to address the issue of reduced accuracy of onboard sensors of CUAVs due to hazing weather conditions, leading to unreliable and unsafe execution of landing tasks, we propose a dehazing algorithm and a runway detection and localization algorithm to tackle the aforementioned problem.
In the dehazing network model, we propose a U-shaped network structure, introduce the lightweight fused attention mechanism module combining simple spatial attention and simple channel attention and utilize a Transformer module with linear computational complexity. This implementation ensures both enhanced dehazing capability of the network and low computational cost during forward propagation. Considering real-time requirements, we select the UFLD algorithm as the runway detection algorithm. By filtering and localization the detected points, we can obtain highly accurate localization runway line equations.
Finally, we conducted extensive experiments on our entire algorithm framework. We compared the performance of our dehazing model with other state-of-the-art models, as well as the localization results obtained from our dehazing algorithm and localization method. The experiments demonstrate that our dehazing algorithm achieves the best dehazing effect while significantly improving runway detection performance under hazed conditions.
Additionally, to further ensure the safe and reliable landing of carrier-based UAVs, it is important to note the following: First, our image dehazing model may not address other types of image degradation, such as rainy images or motion-blurred images. Second, due to the limitations of the dataset used for training, the optimization effect on images in heavy fog conditions may not be satisfactory. Third, this paper does not address the control decision methods for carrier-based UAVs after detecting the runway lines. Our future work will focus on designing better algorithms to address these issues.

Author Contributions

Conceptualization, C.L.; methodology, C.L. and Y.W.; software, C.L. and Y.W.; validation, Y.Z., R.M. and Y.W.; formal analysis, R.M., and C.Y.; investigation, P.L.; resources, P.L.; data curation, R.M.; writing—original draft preparation, C.L.; writing—review and editing, P.L. and C.Y.; visualization, C.L.; supervision, P.L.; project administration, P.L.; funding acquisition, P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62273178, and in part by the National Key Research and Development Program of China U2233215 and in part by Phase VI 333 Engineering Training Support Project.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Under reasonable conditions, all datasets used in this paper are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ma, N.; Weng, X.; Cao, Y.; Wu, L. Monocular-Vision-Based Precise Runway Detection Applied to State Estimation for Carrier-Based UAV Landing. Sensors 2022, 22, 8385. [Google Scholar] [CrossRef] [PubMed]
  2. Wang, Y.; Teoh, E.K.; Shen, D. Lane detection and tracking using B-Snake. Image Vis. Comput. 2004, 22, 269–280. [Google Scholar] [CrossRef]
  3. Aly, M. Real time detection of lane markers in urban streets. In Proceedings of the 2008 IEEE Intelligent Vehicles Symposium, Eindhoven, The Netherlands, 4–6 June 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 7–12. [Google Scholar]
  4. Pan, X.; Shi, J.; Luo, P.; Wang, X.; Tang, X. Spatial as deep: Spatial cnn for traffic scene understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
  5. Qin, Z.; Wang, H.; Li, X. Ultra fast structure-aware deep lane detection. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part XXIV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 276–291. [Google Scholar]
  6. Gurghian, A.; Koduri, T.; Bailur, S.V.; Carey, K.J.; Murali, V.N. Deeplanes: End-to-end lane position estimation using deep neural networksa. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 38–45. [Google Scholar]
  7. Huval, B.; Wang, T.; Tandon, S.; Kiske, J.; Song, W.; Pazhayampallil, J.; Andriluka, M.; Rajpurkar, P.; Migimatsu, T.; Cheng-Yue, R.; et al. An empirical evaluation of deep learning on highway driving. arXiv 2015, arXiv:1504.01716. [Google Scholar]
  8. Lee, S.; Kim, J.; Shin Yoon, J.; Shin, S.; Bailo, O.; Kim, N.; Lee, T.H.; Seok Hong, H.; Han, S.H.; So Kweon, I. Vpgnet: Vanishing point guided network for lane and road marking detection and recognition. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1947–1955. [Google Scholar]
  9. Huang, Y.; Chen, S.; Chen, Y.; Jian, Z.; Zheng, N. Spatial-temproal based lane detection using deep learning. In Proceedings of the Artificial Intelligence Applications and Innovations: 14th IFIP WG 12.5 International Conference, AIAI 2018, Rhodes, Greece, 25–27 May 2018; Proceedings 14. Springer: Berlin/Heidelberg, Germany, 2018; pp. 143–154. [Google Scholar]
  10. Song, Y.; He, Z.; Qian, H.; Du, X. Vision transformers for single image dehazing. IEEE Trans. Image Process. 2023, 32, 1927–1941. [Google Scholar] [CrossRef] [PubMed]
  11. Lu, L.; Xiong, Q.; Chu, D.; Xu, B. MixDehazeNet: Mix Structure Block For Image Dehazing Network. arXiv 2023, arXiv:2305.17654. [Google Scholar]
  12. Gui, J.; Cong, X.; Cao, Y.; Ren, W.; Zhang, J.; Zhang, J.; Cao, J.; Tao, D. A comprehensive survey and taxonomy on single image dehazing based on deep learning. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
  13. McCartney, E.J. Optics of the Atmosphere: Scattering by Molecules and Particles; John Wiley and Sons, Inc.: New York, NY, USA, 1976. [Google Scholar]
  14. Narasimhan, S.G.; Nayar, S.K. Contrast restoration of weather degraded images. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 713–724. [Google Scholar] [CrossRef]
  15. Nayar, S.K.; Narasimhan, S.G. Vision in bad weather. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; IEEE: Piscataway, NJ, USA, 1999; Volume 2, pp. 820–827. [Google Scholar]
  16. He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
  17. Fattal, R. Dehazing using color-lines. Acm Trans. Graph. (TOG) 2014, 34, 1–14. [Google Scholar] [CrossRef]
  18. Liu, J.; Liu, W.; Sun, J.; Zeng, T. Rank-one prior: Toward real-time scene recovery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14802–14810. [Google Scholar]
  19. Zhu, Q.; Mai, J.; Shao, L. A fast single image haze removal algorithm using color attenuation prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [Google Scholar] [PubMed]
  20. Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef] [PubMed]
  21. Zhang, H.; Patel, V.M. Densely connected pyramid dehazing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3194–3203. [Google Scholar]
  22. Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. Aod-net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4770–4778. [Google Scholar]
  23. Ren, W.; Ma, L.; Zhang, J.; Pan, J.; Cao, X.; Liu, W.; Yang, M.H. Gated fusion network for single image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3253–3261. [Google Scholar]
  24. Guo, C.L.; Yan, Q.; Anwar, S.; Cong, R.; Ren, W.; Li, C. Image dehazing transformer with transmission-aware 3d position embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5812–5820. [Google Scholar]
  25. Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12299–12310. [Google Scholar]
  26. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Nashville, TN, USA, 20–25 June 2021; pp. 10012–10022. [Google Scholar]
  27. Tang, J.; Li, S.; Liu, P. A review of lane detection methods based on deep learning. Pattern Recognit. 2021, 111, 107623. [Google Scholar] [CrossRef]
  28. Chougule, S.; Koznek, N.; Ismail, A.; Adam, G.; Narayan, V.; Schulze, M. Reliable multilane detection and classification by utilizing CNN as a regression network. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018; pp. 740–752. [Google Scholar]
  29. Chiu, K.Y.; Lin, S.F. Lane detection using color-based segmentation. In Proceedings of the IEEE Proceedings. Intelligent Vehicles Symposium, Las Vegas, NV, USA, 6–8 June 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 706–711. [Google Scholar]
  30. Zheng, T.; Fang, H.; Zhang, Y.; Tang, W.; Yang, Z.; Liu, H.; Cai, D. Resa: Recurrent feature-shift aggregator for lane detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 3547–3554. [Google Scholar]
  31. Zheng, T.; Huang, Y.; Liu, Y.; Tang, W.; Yang, Z.; Cai, D.; He, X. Clrnet: Cross layer refinement network for lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 898–907. [Google Scholar]
  32. Han, J.; Deng, X.; Cai, X.; Yang, Z.; Xu, H.; Xu, C.; Liang, X. Laneformer: Object-aware row-column transformers for lane detection. In Proceedings of the AAAI Conference on Artificial Intelligence; 2022; Volume 36, pp. 799–807. [Google Scholar]
  33. Shah, S.; Dey, D.; Lovett, C.; Kapoor, A. AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles. arXiv 2017, arXiv:1705.05065. [Google Scholar] [CrossRef]
  34. Chen, L.; Lu, X.; Zhang, J.; Chu, X.; Chen, C. Hinet: Half instance normalization network for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 182–192. [Google Scholar]
  35. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  36. Liu, X.; Ma, Y.; Shi, Z.; Chen, J. Griddehazenet: Attention-based multi-scale network for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Repuiblic of Korea, 27 October–2 November 2019; pp. 7314–7323. [Google Scholar]
  37. Chen, D.; He, M.; Fan, Q.; Liao, J.; Zhang, L.; Hou, D.; Yuan, L.; Hua, G. Gated context aggregation network for image dehazing and deraining. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 7–11 January 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1375–1383. [Google Scholar]
  38. Dong, J.; Pan, J. Physics-based feature dehazing networks. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part XXX 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 188–204. [Google Scholar]
  39. Neven, D.; De Brabandere, B.; Georgoulis, S.; Proesmans, M.; Van Gool, L. Towards end-to-end lane detection: An instance segmentation approach. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 286–291. [Google Scholar]
  40. Hou, Y.; Ma, Z.; Liu, C.; Loy, C.C. Learning lightweight lane detection cnns by self attention distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Repuiblic of Korea, 27 October–2 November 2019; pp. 1013–1021. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.