Survey on Comprehensive Visual Perception Technology for Future Air–Ground Intelligent Transportation Vehicles in All Scenarios

Ren, Guixin; Chen, Fei; Yang, Shichun; Zhou, Fan; Xu, Bin

doi:10.3390/engproc2024080050

Open AccessProceeding Paper

Survey on Comprehensive Visual Perception Technology for Future Air–Ground Intelligent Transportation Vehicles in All Scenarios^†

by

Guixin Ren

,

Fei Chen

^*,

Shichun Yang

,

Fan Zhou

and

Bin Xu

Department of Automotive Engineering, School of Transportation Science And Engineering, Beihang University, Beijing 102206, China

^*

Author to whom correspondence should be addressed.

^†

Presented at the 2nd International Conference on Green Aviation (ICGA 2024), Chengdu, China, 6–8 November 2024.

Eng. Proc. 2024, 80(1), 50; https://doi.org/10.3390/engproc2024080050

Published: 30 May 2025

(This article belongs to the Proceedings of 2nd International Conference on Green Aviation (ICGA 2024))

Download

Browse Figures

Versions Notes

Abstract

As an essential part of the low-altitude economy, low-altitude carriers are an important cornerstone of its development and a new industry that cannot be ignored strategically. However, it is difficult for the existing two-dimensional vehicle autonomous driving perception scheme to meet the needs of general key technologies for all-scene perception such as the global high-precision map construction of low-altitude vehicles in a three-dimensional space, the perception identification of local environmental traffic participants, and the extraction of key visual information under extreme conditions. Therefore, it is urgent to explore the development and verification of all-scene universal sensing technology for low-altitude intelligent vehicles. In this paper, the literature on vision-based urban rail transit and general perception technology in low-altitude flight environment is studied, and the paper summarizes the research status and innovation points from five aspects, namely the environment perception algorithm based on visual SLAM, the environment perception algorithm based on BEV, the environment perception algorithm based on image enhancement, the performance optimization of the perception algorithm using cloud computing, and the rapid deployment of the perception algorithm using edge nodes, and puts forward the future optimization direction of this topic.

Keywords:

low-altitude economy; low-altitude intelligent vehicles; SLAM perception and mapping; bird’s-eye view (BEV) perception; visual enhancement

1. Introduction

Urban rail transit and low-altitude economy are the key components of the modern national transportation system, and their development level is directly related to the implementation of national strategy and the prosperity of regional economy. The low-altitude economy is a comprehensive economic form that is affected by various low-altitude flight activities of manned and unmanned aircraft and continues to promote the integration and development of related fields. At this year’s National Two Sessions, low-altitude economy was written into the government work report for the first time, once again highlighting its important position in the national economic development. In July, the Ministry of Industry and Information Technology, the Ministry of Science and Technology, the Ministry of Finance, and the Civil Aviation Administration issued the Implementation Plan for the Innovative Application of General Aviation Equipment (2024–2030) (hereinafter referred to as the “Plan”), which proposed to promote the formation of a trillion-class market-sized low-altitude economy by 2030. As the frontier direction of a low-altitude economy, flying cars have not only become a hot topic in the world but have also heralded a potential trillion-level market, and its development has immeasurable value for building a comprehensive three-dimensional transportation network.

Visual perception technology is the core of flying car autonomous driving and autonomous docking guidance technology. Flying cars need accurate environment perception algorithms to achieve safe and efficient navigation. These algorithms must be able to handle multi-scale obstacle detection, large-scale environment mapping, and adaptability to complex dynamic environments. The evolution of vision algorithms has undergone a transformation from traditional image processing to deep learning innovation, and although the existing research has laid the foundation for the development of vision algorithms, the existing technologies still face significant challenges in terms of real time, accuracy, and robustness.

In this paper, the literature related to the general perception technology in the visual-based urban rail transit and low-altitude flight environment is studied. This paper summarizes the research status and innovation points from five aspects, namely the environmental perception algorithm based on visual SLAM, the environmental perception algorithm based on BEV, the environmental perception algorithm based on image enhancement, the performance optimization of the perception algorithm using cloud computing, and the rapid deployment of the perception algorithm using edge nodes.

2. Both Domestic and International Research

2.1. Environment Perception Algorithm Based on Visual SLAM

Compared with LiDAR, another widely used sensor, visual SLAM is more suitable for the positioning and perception of flying cars in low-altitude environments. Vision sensors can capture images at higher frame rates, allowing algorithms to provide more positioning information. To take full advantage of the flexibility of flying cars, high frame rate odometers are needed to ensure control accuracy. In addition, the vision sensors are able to capture rich texture information, which gives the flying car perception system a high level of understanding of the environment, providing it with the ability to perform smarter tasks such as object tracking, semantic segmentation, and implicit reconstruction. Although filter-based visual SLAM cannot handle large-scale scenes over long periods of time, a Smooth Variable Structure Filter (SVSF) was adopted in [1,2] to solve the SLAM problem. The filter demonstrates exceptional resilience to uncertain parameters and unpredictable noise attributes. Both techniques outperform the conventional filter-based approaches in accuracy and stability. The Adaptive Smooth Variable Structure Filter (ASVSF), introduced by [3], incorporates a covariance matrix to assess the estimated uncertainty of the initial SVSF, thereby enhancing its positioning reliability, particularly in environments with fluctuating noise disturbances. In addition, dynamic environments are dealt with by removing dynamic information in [4]. Recently, a novel monocular inertial SLAM algorithm utilizing SVSF was introduced in [5] to achieve accurate real-world localization for UAV navigation. Overall, this SVSF-based SLAM approach demonstrates superior performance when compared to conventional filtering techniques and is able to deal with uncertainty and a lot of noise.

Non-textured environments such as white walls, spaces, and long tunnels seriously affect the performance of feature-based visual SLAM algorithms. Direct SLAM techniques are typically employed in these settings and can be categorized into the following three distinct types: dense, semi-dense, and sparse. As illustrated in Figure 1, DTAM [6] employs a keyframe-based approach to generate a detailed depth map by minimizing overall photometric discrepancies. A keyframe-based framework is also employed in [7] to build dense maps and track RGB-D cameras by minimizing photometric (intensity) and geometric (depth) errors. The literature [8] proposes a semi-dense monocular visual sounding method, which models the inverse depth of each pixel using a Gaussian probability distribution, and this model is carried over from one frame to the next, particularly focusing on areas with significant changes in brightness. LSD-SLAM [9] completes the ranging algorithm of the SLAM algorithm by maintaining a global map containing keyframe pose maps and related probability semi-dense depth maps, which reduces the cumulative drift and scale drift in terms of larger-scale estimation. SVO [10] employs a semi-direct method for monitoring camera movement and initially gauges the camera’s position by aligning sparse image models thus reducing the brightness discrepancy between corresponding pixels. Then, the reprojection error is minimized by feature alignment to achieve attitude and structural joint optimization. The former approach has been significantly expanded [11] to accommodate various camera setups, edge detection, and additional camera types by implementing comprehensive photometric adjustments, such as exposure duration, lens shading, and non-linear response curves. DSO [12] introduces a sparse and direct visual odometry method, demonstrating superior results compared to dense or direct techniques. A new formula for modeling photometric parameters minimizes photometric errors and collectively optimizes camera attitude, affine brightness parameters, inverse depth values, and camera inherent characteristics.

Despite the effectiveness of direct-based SLAM algorithms in addressing texture-less environments, they are not without their drawbacks. For instance, methods that rely on structural elements, such as points, lines, and surfaces, are employed to manage scenarios where point features are insufficient. PL-SLAM [4,5], an extension of ORB-SLAM, introduces a novel initialization technique that solely depends on line correspondences, enabling the estimation of initial mapping from three sequential frames. Refs. [5,6,7,8,9,10,11,12,13,14,15,16,17,18] have developed several advanced algorithms that integrate both point and line features, leveraging lines to boost resilience and precision, particularly in areas lacking texture. Beyond these improvements, the texture-free problem is visually solved by receiving more information about the environment through increasing the camera’s field of view, fisheye, omnidirectional [19,20], and multi-camera configurations [21,22,23].

In conventional visual SLAM, the primary focus is on the unchanging settings. Nevertheless, the actual surroundings are inherently intricate and ever-changing. The presence of moving elements in a scene can disrupt position tracking and lead to inaccuracies. Consequently, addressing the SLAM challenge in such dynamic conditions has become a significant area of interest, underpinning numerous practical uses. To tackle this issue, various SLAM methods have been developed, broadly categorized into two groups. One approach involves identifying and eliminating mobile points or entities during the initial tracking phase; this strategy considers dynamic data as anomalies and focuses solely on static information, thereby simplifying dynamic issues into static ones. While this method is straightforward and efficient, it neglects the potential benefits of dynamic data. An alternative approach involves tracking moving objects in real-time during SLAM execution. By constructing a map that includes both a fixed background and moving elements, this technique leverages dynamic data to achieve significantly greater precision compared to simply discarding it. Furthermore, object-centric SLAM extracts environmental semantic details for both static and dynamic entities, enabling joint optimization through the creation of a coherent, object-level representation of the surroundings.

Random Sample Consistency (RANSAC) [24] is a widely employed technique for eliminating outliers and enhancing system robustness. This approach randomly selects data points to fit a model that includes the largest number of inliers. Algorithms like PTAM, ORB-SLAM, and many other visual SLAM systems utilize RANSAC to filter out anomalies, ensuring stability in environments with minor dynamic elements, which can otherwise disrupt the process when a significant portion of the image is in motion [25]. An adaptive RANSAC algorithm based on prior knowledge has been introduced to handle scenes with multiple dynamic points. This method is similar to the conventional RANSAC but considers the distribution of inliers to achieve a more precise model fit. The aforementioned algorithm primarily relies on RANSAC for outlier removal [26]. A deep learning-based RGB-D SLAM system has been proposed, which generates keyframes with static features to weight points and edges, thereby determining their static or dynamic nature. This enables frame-to-keyframe alignment for accurate motion recovery (Figure 2) [27]. A dense scene flow representation is utilized to identify moving objects. The algorithm initially makes rough estimates using a standard odometry method and then refines the results by discarding outliers. To address object-level dynamic anomalies [28], convolutional neural networks (CNNs) are employed for the image segmentation of previously known dynamic objects. Built on the ORB-SLAM2 framework, this method uses a Mask R-CNN module to segment dynamic objects. Additionally, it maintains a map of static points and synthesizes frames that exclude dynamic objects using a background reconstruction module. These algorithms employ straightforward yet effective methods to perform SLAM in dynamic environments, significantly reducing the impact of dynamic outliers.

For the practical implementation of flying vehicles, it is essential to integrate simultaneous localization and mapping (SLAM) and moving target tracking (MOT) to enable fundamental operations and a deeper environmental comprehension for sophisticated tasks like autonomous navigation [29]. Theoretically, a unified mathematical framework has been devised to tackle SLAM and MOT challenges, providing a robust basis. Practically, an algorithm has been crafted to handle perception, movement, and data correlation. ClusterSLAM [30] introduces the back end of a stereo vision SLAM system that leverages both static and dynamic reference points. By grouping the motion of dynamic rigid elements, it facilitates decoupled factor graph optimization, thereby estimating camera movement, static markers, and dynamic rigid body motion. Although ClusterSLAM heavily depends on the accuracy of landmark tracking and association, ClusterVO [31] is designed as a comprehensive pipeline for camera or object motion estimation. This method involves extracting ORB features and semantic bounding boxes, establishing multi-level probabilistic connections. To classify landmarks into rigid moving entities, heterogeneous CRF components are employed, and state estimation is achieved through sliding window bundle adjustment. DynaSLAM2 [32] and VDO-SLAM [33] incorporate camera position, static and dynamic points, and object dynamics into BA factor graph optimization, harnessing dynamic data for superior performance.

While the above methods show great potential for low-altitude localization and perception, many challenges remain, including real-time performance limitations, poor adaptability to dynamic environments, difficult feature extraction in non-texture-free environments, and the complexity of multi-sensor data fusion, hardware resource optimization, outlier removal, obstacle avoidance strategy evaluation, and deep learning integration.

2.2. Environment Perception Algorithm Based on BEV

A conventional and direct method for transforming perspective views into bird’s-eye views involves leveraging the inherent geometric projection link between the two. This technique is referred to as a geometry-centric approach. Earlier studies can be categorized into the following two main types, depending on their strategy for connecting these perspectives: those that use homography and those that rely on depth information Figure 3.

As data-driven methodologies in computer vision have progressed, various deep learning techniques have been developed to improve BEV perception by tackling the PV-BEV transformation problem. These methods can be categorized into the following three primary categories based on their view conversion strategies: depth-centric, MLP-centric, and converter-centric approaches. In deep learning-based solutions, the key strategy is to transform 2D features into 3D space using either explicit or implicit depth estimation. For every pixel in an image, a ray is projected from the camera and intersects with real-world objects. Rather than directly mapping these pixels to the BEV, the approach involves calculating the depth distribution for each pixel, using this information to project the 2D features into 3D space, and then deriving the BEV representation through dimensional reduction.

Multiple hypotheses have been posited regarding the depth, including specific figures, consistent spread along the rays, or categorized distribution. Depth surveillance is obtained from the ultimate definitive depth figure or task oversight. Considering the substantial advancements in deep neural networks for tackling computer vision challenges by functioning as intricate mapping functions that transform inputs into diverse outputs, a straightforward method involves employing variational encoder–decoders or MLPS to map PV attributes onto BEVs. While the MLP-centric technique is straightforward to execute, it struggles with generalization in intricate scenarios involving occlusions and multi-view setups. In reality, this method follows a bottom–up tactic, managing the conversion process sequentially. Alternatively, the transformer-driven strategy adopts a top–down methodology, constructing BEV queries directly and leveraging a cross-attention mechanism to locate corresponding features in perspective images. To cater to a range of subsequent tasks, sparse, dense, or hybrid queries are suggested. These transformer-based techniques exhibit robust relationship modeling and data correlation, achieving remarkable results.

IPM represents the groundbreaking effort to warp a frontal image into a top-down perspective, making it an intuitive choice for pre- or post-processing. This transformation involves applying a camera rotation homography followed by anisotropic scaling [34]. The homography matrix can be calculated from the camera’s intrinsic and extrinsic parameters. Certain approaches [35] leverage convolutional neural networks (CNNs) to extract semantic features from the perspective images, estimating vertical vanishing points and ground plane vanishing lines (horizon lines) to determine the homography matrix. Post-IPM, a wide array of downstream perception tasks can be executed using bird’s-eye view (BEV) images, including optical flow estimation, detection, segmentation, motion prediction, and planning. VPOE [36] utilizes Yolov [37] as the detection backbone to estimate vehicle position and orientation in the BEV. Using a synthetic dataset [38] also projects dashboard camera detections onto the BEV occupancy map via IPM. In real-world scenarios where camera parameters may be unknown, TrafCam3D [39] introduces a robust homography graph based on a dual-view network architecture to reduce IPM distortion.

Certain approaches opt to utilize IPM for altering feature maps during the network training phase, as opposed to incorporating IPM in the pre- or post-processing stages. Cam2BEV [40] leverages IPM to convert the feature maps from several onboard cameras into a comprehensive bird’s-eye view (BEV) semantic map. MVNet [41] employs IPM to project 2D features into a unified BEV space, integrating multi-view data and addressing occlusion issues in pedestrian detection with large convolution kernels, whereas 3D LaneNet [42] is designed to forecast the 3D configuration of lanes from a single image without any presumptions about camera elevation. It also trains additional network components under supervision to estimate the homography matrix, subsequently applying projection transformations across various scales of feature maps. Gu et al. [43] implemented 2D detection predictions to refine 3D boundaries globally and introduce corresponding losses to embed geometric constraints between 2D and BEV spaces.

Given the substantial disparity and severe distortions between the front view and the bird’s-eye view, utilizing IPM alone is insufficient for producing undistorted images or semantic maps. To enhance the realism of generated BEV features or images, generative adversarial networks (GANs) [44] are employed. BridgeGAN [45] leverages a homologous perspective as an intermediary viewpoint and introduces a multi-GAN framework to learn cross-view transformations between PV and BEV. Later research [46] tackled the monocular 3D detection challenge by performing a 2D analysis of the BEV and aligning it with the ground plane estimation to achieve the final 3D detection. MonoLayout [47] also employs GANs to generate data for occluded areas and to estimate the layout of scenes containing dynamic objects. RAP [48] proposes an incremental GAN to improve the reliability of IPM for the forward-facing camera by using robust real-world markers, which effectively reduces the distortion of distant objects.

Traditional geometric-based methods may not be sufficient to produce distortion-free images or semantic graphs when dealing with significant gaps and severe deformations between perspective and BEV. Deep learn-based approaches improve 2D to 3D conversion through explicit or implicit depth estimation, but may face difficulties in generalization under occlusion and multi-view input settings. MLP-based approaches are easy to implement but may be difficult to handle effectively in complex scenes. While transformer-based approaches have powerful relationship modeling capabilities, they may require a lot of computational resources. In addition, generative adversarial networks (GANs), while used to enhance the authenticity of BEV features or images, present challenges in precise alignment and detail preservation. Therefore, although current methods are effective under specific conditions, further research and innovation are needed to achieve robust, accurate, and real-time BEV perception.

2.3. Environment Perception Algorithm Based on Image Enhancement

Image enhancement involves accentuating valuable data within an image while diminishing or eliminating irrelevant information, tailored to specific requirements. The goal is to refine the image to better align with human visual perception or to facilitate machine analysis. Over the past three decades, a plethora of image enhancement techniques have emerged. Among the most prevalent are histogram equalization, wavelet transform, partial differential equation methods, and the Retinex approach, which is grounded in color constancy theory. Histogram equalization (HE) is the foundational technique, known for its simplicity, ease of implementation, and efficient performance. It enhances image contrast and expands the dynamic range by adjusting the gray level probability density function (PDF) to approximate a uniform distribution [49,50,51]. Numerous HE-based advancements have been developed, each with unique attributes. For instance, the double histogram equalization (BBHE) method addresses uneven local brightness [52], while the equal area double histogram equalization (DSI-HE) and two-dimensional spatial information entropy histogram equalization (SEHE) aim to maximize information entropy, thereby mitigating detail loss [53,54,55,56]. The maximum brightness double histogram equalization (MMBEBHE) minimizes the mean brightness error between the enhanced and original images [54].

The logarithmic mapping-based histogram equalization (LMHE) technique aligns the improved image more closely with human visual perception [55]. The wavelet transform (WT) method for image enhancement separates the image into low-frequency and high-frequency components, applying distinct enhancements to each to emphasize the image’s finer details [57,58,59,60,61,62,63,64]. The knee function and gamma correction function are used to enhance low-frequency images, which can effectively improve the overall brightness of images [65]. Better enhancement effects can also be achieved by enhancing the image contrast defined in the wavelet domain and the singular matrix of the image [66,67]. By combining the curve-wave transform with the wavelet transform, the noise generated in the image enhancement process of the wavelet transform can be effectively removed [68].

The partial differential equation (PDE) technique for image enhancement boosts the image’s contrast range [69,70,71,72,73,74]. By employing the Total Variation Model, PDE-based image enhancement (TVPDE) ensures that the enhanced image not only boasts heightened contrast but also closely resembles the original, preserving its intricate details [75]. In addition, there are many improved algorithms for a gradient function in the image enhancement algorithm of PDE, and all of them have achieved good enhancement effects [73,76]. The Retinex algorithm enhances images by isolating the intrinsic color of objects, eliminating the impact of lighting variations in the original image and thus improving overall image quality [77,78,79,80,81,82]. Using Markov random field (MRF) to solve the reflection component of an object can effectively eliminate the phenomenon of a “halo artifact” caused by uneven illumination [83]. By integrating alternating direction optimization (ADO) with the Fast Fourier Transform (FFT), it becomes possible to simultaneously compute both the illumination and reflectance components of an object. This approach enhances the robustness of the Retinex image enhancement algorithm’s outcomes [84]. Additionally, employing a sparse representation technique to model the reflectance component, followed by utilizing a learned dictionary to capture detailed image features within the reflectance, can further improve the enhancement results [85].

Among the above algorithms, traditional image enhancement algorithms may not be able to effectively deal with complex image environments in uneven illumination or dynamic scenes. Although the algorithms based on HE (Histogram Equalization) and its improved version improve image contrast and dynamic range, they may result in the loss of image details. The wavelet transform image enhancement algorithm may introduce noise when highlighting detail information. The partial differential equation (PDE) image enhancement algorithm may sacrifice image detail while improving contrast. Although the Retinex algorithm can remove the effect of illumination, it may have shortcomings in processing speed and robustness. Therefore, although current image enhancement algorithms perform well in specific applications, further research and innovation are needed to achieve more accurate and robust image perception, especially in terms of the algorithm’s adaptability, detail retention, real-time processing capabilities, and generalization ability to complex environments.

2.4. Use Cloud Computing to Optimize the Performance of Perception Algorithms

Due to the limited energy and carrying capacity of UAVs, modules such as visual perception usually require a large amount of calculation when conducting performance optimization and online perception verification, which will undoubtedly affect the endurance of UAVs themselves. In view of this, when optimizing and verifying the deployment of the visual perception algorithm, the work of online verification and deployment should be moved to the edge for decentralized processing, and the performance optimization should be moved to the cloud for centralized calculation [86]. With the unique resource service mode of cloud computing, resources needed in the resource pool can be conveniently shared anywhere at any time [44]. The traditional cloud computing framework is categorized into IaaS (Infrastructure as a Service), PaaS (Platform as a Service), SaaS (Software as a Service), and another variant of PaaS (Platform as a Service) (Figure 4).

In terms of performance optimization, due to the above advantages of cloud computing, computing resources can be allocated. An algorithm leveraging ant colony optimization can enhance the distribution of computational resources. When allocating computing resources, the computing quality of potential available nodes is predicted, and then according to the characteristics of the cloud computing environment, by analyzing the influence of factors such as bandwidth occupancy, line quality, and response time on allocation, the ant colony optimization algorithm is used to obtain an optimal set of computing resources [48]. In-depth examination reveals that the primary goal of cloud computing optimization is to allocate tasks to cloud resources in a manner that ensures the most efficient scheduling while minimizing resource usage. Scholars utilize a meta-heuristic strategy to develop a versatile algorithm capable of addressing scheduling and optimization issues effectively [87]. Task scheduling in cloud computing employs heuristic algorithms that identify optimal solutions by simulating genetic operations, variation, and natural selection, based on the evolutionary principles of living things [88,89]. Heuristic algorithms have been shown to significantly reduce job execution, with significant advantages in reducing task execution time, improving resource utilization, reducing energy consumption, and improving throughput [90,91,92,93].

Adaptive particle swarm optimization (PSO) effectively minimizes job processing time, enhances throughput, and boosts the average resource utilization rate (ARUR). Furthermore, the Linearly Decreasing Adaptive Inertia Weight (LDAIW) technique is implemented to refine the adaptive inertia weight [94].

A specialized version of the Distributed Grey Wolf Optimizer (DGWO) is designed for task scheduling on virtual machines. This approach employs the maximum ordered value (LOV) technique to transform continuous candidate solutions produced by DGWO into discrete ones. In contrast to particle swarm optimization (PSO) and the standard Grey Wolf Optimizer, this variant of DGWO excels at rapidly assigning tasks to virtual machines, achieving the shortest completion times among the tested algorithms [95].

2.5. Use Edge Nodes to Rapidly Deploy Sensing Algorithms

In recent years, numerous sophisticated strategies have been developed to address the task offloading challenge in edge computing. These strategies can be broadly categorized into the following two types based on the presence or absence of a centralized control hub: the centralized approach and the decentralized approach. Centralized methods encompass techniques such as convex optimization, heuristic algorithms, and machine learning, whereas decentralized methods include federated learning and blockchain. Some current research efforts focus on converting non-convex problems into convex optimization problems to derive near-optimal solutions. For instance, a Lyapunov-based decomposition method is employed in [96] to minimize the response times in a networked edge computing environment. In [97], the computational offloading issue in UAV scenarios is tackled using successive convex approximation to resolve the non-convex optimization problem. Beyond convex optimization, other studies apply heuristic algorithms to solve task offloading. For example, refs. [98,99] utilized a non-dominated sorting genetic algorithm (NSGA) to handle multi-objective optimization challenges.

With the advancement of artificial intelligence, an increasing number of studies are leveraging machine learning to address the task offloading challenge in edge computing. In [100], a distributed offloading technique that integrates parallel computing and deep learning was introduced. For edge computing settings, refs. [101] employed Deep Q Networks (DQN) to optimize computational performance while deciding on offload strategies. DQN was also utilized in [102,103,104] to identify the most effective offloading strategy in IoT environments. Furthermore, refs. [105] framed the task offloading issue as a multi-label classification problem, employing deep supervised learning to resolve it. Besides the centralized approaches, decentralized machine learning techniques have gained popularity in recent years. Federated learning, a distributed machine learning approach, enables model training across multiple edge devices or servers without sharing local data, enhancing communication efficiency and privacy [106]. Ref. [107] introduced an online solution based on actor–critic federated learning (AC-Federate) for the detailed offloading challenges in multi-access edge computing, offering significant benefits in latency and energy consumption. Ref. [108] suggested an asynchronous joint offloading method using DQN to tackle the task offloading problem in vehicular networks. To handle task offloading in challenging conditions, Ref. [109] presented a decision model that combined deep reinforcement learning with federated learning, incorporating blockchain for secure parameter sharing and distribution in federated learning. Ref. [110] examined the issue of computational offloading and resource allocation in dynamic environments, proposing a solution that merged deep deterministic policy gradient (DDPG) with federated learning.

3. Reference the Dataset

3.1. Cross-View Time (CVT) Dataset

Each set of data includes (1) ground images with geographical markers, (2) photographed pictures and corresponding time, and (3) synchronized high-altitude images. The biggest feature of this dataset is spring, summer, and winter (three seasons) and the morning, noon, and early morning of each day, providing the environmental changes at different times of the day Figure 5.

3.2. NPS-Drones

A US Navy drone dataset of 50 videos, captured in HD resolution (1920 × 1080 and 1280 × 760) using a GoPro-3 camera, is shown in Figure 6.

The smallest, typical, and largest dimensions of the drones are 10 × 8, 16.2 × 11.6, and 65 × 21 inches, respectively. The dataset comprises a total of 70,250 frames.

3.3. FL-Drones

There are photos of the drones indoors and outdoors; the drones are of different shapes and barely changed shape even in the successive frames.

The smallest, typical, and largest dimensions of the drones are 9 × 9, 25.5 × 16.4, and 259 × 197 inches, respectively, featuring frame resolutions of 640 × 480 and 752 × 480. The dataset comprises 14 videos, totaling 38,948 frames.

3.4. DTB70

Regarding the UAV tracking photo dataset shown in Figure 7, because these photos were taken while moving at high speeds, the pictures are fuzzy, making it difficult to identify and track the drones. This UAV can be used for high-speed scene target recognition training.

3.5. UAV–Human Dataset

This dataset was gathered by an aerial drone across diverse urban and rural settings over a period of three months, both during daylight and nighttime, encompassing a broad spectrum of subjects, environments, illumination conditions, weather patterns, obstructions, camera movements, and drone flight orientations.

The dataset is large and useful for industry as well. It can be used for motion recognition and posture estimation, allowing for the understanding and parsing of human behavior in drone images, as well as the classification of the meaning of human behavior in images. The dataset comprises 67,428 multi-modal video clips and 119 motion recognition targets, 22,476 frames for estimating postures, 41,290 frames along with 1144 additional frames for re-identifying individuals, and 22,263 frames for recognizing attributes (Figure 8).

4. Comparative Analysis

4.1. Environment Perception Algorithm Based on Visual SLAM

In general, compared with the traditional filtering methods, the SVSF-based SLAM algorithm shows excellent performance, but although the direct-based SLAM algorithm can effectively solve the non-texture environment, its limitations still exist. Therefore, several proposed SLAM algorithms, although they show great potential in low-altitude positioning and perception, still face many challenges including limitations in real-time performance, adaptability to dynamic environments, feature extraction in non-textured environments, and so on.

4.2. Environment Awareness Algorithm Based on BEV

A traditional and straightforward solution to converting perspective views into BEVs is to take advantage of the natural geometric projection relationship between them. Among them, the transformer-based method has strong relationship modeling ability and data correlation characteristics and has achieved good performance. However, the substantial disparity and severe distortion between the front perspective and the overhead view mean that IPM by itself is insufficient for producing images or semantic maps without any warping. The current method is effective under certain conditions, so further research and innovation are still needed to achieve robust, accurate, and real-time BEV perception.

4.3. Environment Perception Algorithm Based on Image Enhancement

Image enhancement involves accentuating valuable data within an image while diminishing or eliminating irrelevant details to meet particular requirements. In the histogram equalization technique, wavelet transformation method, partial differential equation approach, and Retinex algorithm grounded in color constancy principles, traditional image enhancement algorithm may not be able to effectively deal with complex image environment in uneven illumination or dynamic scene. Although the algorithms based on HE (Histogram Equalization) and its improved version improve image contrast and dynamic range, they may cause the loss of image detail information. The wavelet transform image enhancement algorithm may introduce noise when highlighting detailed information. The partial differential equation (PDE) image enhancement algorithm may sacrifice image detail while improving contrast. Although the Retinex algorithm can remove the effect of illumination, it may have shortcomings in processing speed and robustness. Therefore, although the current image enhancement algorithms perform well in specific applications, further research and innovation are needed to achieve more accurate and robust image perception, especially in terms of the algorithm’s adaptability, detail retention, real-time processing capabilities, and generalization ability in complex environments.

4.4. Use Cloud Computing to Optimize the Performance of Perception Algorithms

An algorithm for managing computational resource distribution, inspired by ant colony optimization, can significantly enhance the efficiency of resource allocation; Heuristic algorithms have been shown to significantly reduce job execution and have significant advantages in reducing task execution time, improving resource utilization, decreasing energy usage, and enhancing processing capacity. Adaptive particle swarm optimization (PSO) decreases task execution duration, while also boosting processing capacity and average resource utilization (ARUR). In contrast with standard PSO and the Grey Wolf Optimizer, DGWO allocates tasks to virtual machines more rapidly and achieves superior completion times compared to alternative algorithms.

4.5. Rapid Deployment of Perception Algorithms Using Edge Nodes

In the context of centralized and distributed strategies, contemporary studies often convert non-convex challenges into convex optimization tasks to derive approximate solutions. Federated learning, a form of distributed machine learning, enables the training of models across various dispersed edge devices or servers without sharing local data, thereby enhancing communication efficiency and safeguarding privacy. For the specific issue of fine-grained offloading in multi-access edge computing, an actor–critic-based federated learning approach (AC-Federate) has shown significant benefits in reducing latency and energy usage.

5. Summary

To sum up, the current problem encompasses insufficient algorithm perception and target recognition ability in complex environment scenarios. In the operating environment of low-altitude flight and rail transit equipment, especially in the perception environment such as rapid scene transformation and complex weather, the sensor image will be blurred, and the data will be abnormal. Strengthening the data processing ability and perception ability of the visual perception algorithm in the complex environment, as well as the data fusion ability of various sensors, and solving problems such as insufficient rapid deployment ability of the algorithm, is the direction of innovation and breakthrough in this field in the future.

Author Contributions

Conceptualization, G.R.; methodology, F.C.; software, F.Z.; validation, G.R.; formal analysis, G.R.; investigation, G.R.; resources, S.Y.; data curation, F.Z.; writing—original draft preparation, G.R.; writing—review and editing, F.Z.; visualization, G.R.; supervision, B.X.; project administration, F.Z.; funding acquisition, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been supported by the National Natural Science Foundation of China (No. U22A202101).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Demim, F.; Nemra, A.; Louadj, K. Robust SVSF-SLAM for unmanned vehicle in unknown environment. IFAC-Pap. 2016, 49, 386–394. [Google Scholar] [CrossRef]
Ahmed, A.; Abdelkrim, N.; Mustapha, H. Smooth variable structure filter VSLAM. IFAC-Pap. 2016, 49, 205–211. [Google Scholar] [CrossRef]
Demim, F.; Boucheloukh, A.; Nemra, A.; Louadj, K.; Hamerlain, M.; Bazoula, A.; Mehal, Z. A new adaptive smooth variable structure filter SLAM algorithm for unmanned vehicle. In Proceedings of the 2017 6th International Conference on Systems and Control (ICSC), Batna, Algeria, 7–9 May 2017; pp. 6–13. [Google Scholar]
Demim, F.; Nemra, A.; Boucheloukh, A.; Louadj, K.; Hamerlain, M.; Bazoula, A. Robust SVSF-SLAM algorithm for unmanned vehicle in dynamic environment. In Proceedings of the 2018 International Conference on Signal, Image, Vision and Their Applications (SIVA), Guelma, Algeria, 26–27 November 2018; pp. 1–5. [Google Scholar]
Elhaouari, K.; Allam, A.; Larbes, C. Robust IMU-Monocular-SLAM For Micro Aerial Vehicle Navigation Using Smooth Variable Structure Filter. Int. J. Comput. Digit. Syst. 2023, 14, 1063–1072. [Google Scholar]
Newcombe, R.A.; Lovegrove, S.J.; Davison, A.J. DTAM: Dense tracking and mapping in real-time. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2320–2327. [Google Scholar]
Kerl, C.; Sturm, J.; Cremers, D. Dense visual SLAM for RGB-D cameras. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 2100–2106. [Google Scholar]
Engel, J.; Sturm, J.; Cremers, D. Semi-dense visual odometry for a monocular camera. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1449–1456. [Google Scholar]
Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast semi-direct monocular visual odometry. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 15–22. [Google Scholar]
Forster, C.; Zhang, Z.; Gassner, M.; Werlberger, M.; Scaramuzza, D. SVO: Semidirect visual odometry for monocular and multicamera systems. IEEE Trans. Robot. 2016, 33, 249–265. [Google Scholar] [CrossRef]
Engel, J.; Koltun, V.; Cremers, D. Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 611–625. [Google Scholar] [CrossRef]
Pumarola, A.; Vakhitov, A.; Agudo, A.; Sanfeliu, A.; Moreno-Noguer, F. PL-SLAM: Real-time monocular visual SLAM with points and lines. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 4503–4508. [Google Scholar]
Gomez-Ojeda, R.; Moreno, F.A.; Zuniga-Noel, D.; Scaramuzza, D.; Gonzalez-Jimenez, J. PL-SLAM: A stereo SLAM system through the combination of points and line segments. IEEE Trans. Robot. 2019, 35, 734–746. [Google Scholar] [CrossRef]
Fu, Q.; Yu, H.; Lai, L.; Wang, J.; Peng, X.; Sun, W.; Sun, M. A robust RGB-D SLAM system with points and lines for low texture indoor environments. IEEE Sens. J. 2019, 19, 9908–9920. [Google Scholar] [CrossRef]
Yang, S.; Scherer, S. Direct monocular odometry using points and lines. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3871–3877. [Google Scholar]
Gomez-Ojeda, R.; Briales, J.; Gonzalez-Jimenez, J. PL-SVO: Semi-direct monocular visual odometry by combining points and line segments. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 4211–4216. [Google Scholar]
Zuo, X.; Xie, X.; Liu, Y.; Huang, G. Robust visual SLAM with point and line features. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 1775–1782. [Google Scholar]
Shu, F.; Wang, J.; Pagani, A.; Stricker, D. Structure plp-slam: Efficient sparse mapping and localization using point, line and plane for monocular, rgb-d and stereo cameras. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 2105–2112. [Google Scholar]
Zhang, Z.; Rebecq, H.; Forster, C.; Scaramuzza, D. Benefit of large field-of-view cameras for visual odometry. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 801–808. [Google Scholar]
Huang, H.; Yeung, S.K. 360vo: Visual odometry using a single 360 camera. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 5594–5600. [Google Scholar]
Matsuki, H.; Von Stumberg, L.; Usenko, V.; Stuckler, J.; Cremers, D. Omnidirectional DSO: Direct sparse odometry with fisheye cameras. IEEE Robot. Autom. Lett. 2018, 3, 3693–3700. [Google Scholar] [CrossRef]
Harmat, A.; Sharf, I.; Trentini, M. Parallel tracking and mapping with multiple cameras on an unmanned aerial vehicle. In Proceedings of the 5th International Conference, Intelligent Robotics and Applications, ICIRA 2012, Montreal, QC, Canada, 3–5 October 2012; Proceedings, Part I 5. Springer: Berlin/Heidelberg, Germany, 2012; pp. 421–432. [Google Scholar]
Kuo, J.; Muglikar, M.; Zhang, Z.; Scaramuzza, D. Redesigning SLAM for arbitrary multi-camera systems. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 2116–2122. [Google Scholar]
Derpanis, K.G. Overview of the RANSAC Algorithm. Image Rochester NY 2010, 4, 2–3. [Google Scholar]
Tan, W.; Liu, H.; Dong, Z.; Zhang, G.; Bao, H. Robust monocular SLAM in dynamic environments. In Proceedings of the 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Adelaide, SA, Australia, 1–4 October 2013; pp. 209–218. [Google Scholar]
Li, S.; Lee, D. RGB-D SLAM in dynamic environments using static point weighting. IEEE Robot. Autom. Lett. 2017, 2, 2263–2270. [Google Scholar] [CrossRef]
Alcantarilla, P.F.; Yebes, J.J.; Almazan, J.; Bergasa, L.M. On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 14–18 May 2012; pp. 1290–1297. [Google Scholar]
Bescos, B.; Facil, J.M.; Civera, J.; Neira, J. DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef]
Wang, C.C.; Thorpe, C.; Thrun, S.; Hebert, M.; Durrant-Whyte, H. Simultaneous localization, mapping and moving object tracking. Int. J. Robot. Res. 2007, 26, 889–916. [Google Scholar] [CrossRef]
Huang, J.; Yang, S.; Zhao, Z.; Lai, Y.K.; Hu, S.M. Clusterslam: A slam backend for simultaneous rigid body clustering and motion estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5875–5884. [Google Scholar]
Huang, J.; Yang, S.; Mu, T.J.; Hu, S.M. ClusterVO: Clustering moving instances and estimating visual odometry for self and surroundings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2168–2177. [Google Scholar]
Bescos, B.; Campos, C.; Tardos, J.D.; Neira, J. DynaSLAM II: Tightly-coupled multi-object tracking and SLAM. IEEE Robot. Autom. Lett. 2021, 6, 5191–5198. [Google Scholar] [CrossRef]
Zhang, J.; Henein, M.; Mahony, R.; Ila, V. VDO-SLAM: A visual dynamic object-aware SLAM system. arXiv 2020, arXiv:2005.11052. [Google Scholar]
Kim, Y.; Kum, D. Deep learning based vehicle position and orientation estimation via inverse perspective mapping image. In Proceedings of the IEEE Intelligent Vehicles Symposium, Paris, France, 9–12 June 2019; pp. 317–323. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Palazzi, A.; Borghi, G.; Abati, D.; Calderara, S.; Cucchiara, R. Learning to Map Vehicles into Bird’s Eye View. In Proceedings of Image Analysis and Processing-ICIAP; Battiato, S., Gallo, G., Schettini, R., Stanco, F., Eds.; Springer: Cham, Switzerland, 2017; Volume 10484, pp. 233–243. [Google Scholar]
Zhu, M.; Zhang, S.; Zhong, Y.E. Monocular 3d vehicle detection using uncalibrated traffic cameras through homography. In Proceedings of the IROS, Prague, Czech Republic, 21 September–1 October 2021; pp. 3814–3821. [Google Scholar]
Reiher, L.; Lampe, B.; Eckstein, L. A sim2real deep learning approach for the transformation of images from multiple vehicle-mounted cameras to a semantically segmented image in bird’s eye view. In Proceedings of the ITSC, Rhodes, Greece, 20–23 September 2020; pp. 1–7. [Google Scholar]
Hou, Y.; Zheng, L.; Gould, S. Multiview detection with feature perspective transformation. In Proceedings of the ECCV; Lecture Notes in Computer Science; Vedaldi, A., Bischof, H., Brox, T., Frahm, J., Eds.; Springer Nature: Glasgow, UK, 2020; Volume 12352, pp. 1–18. [Google Scholar]
Garnett, N.; Cohen, R.; Pe’er, T.E. 3d-lanenet: End-to-end 3d multiple lane detection. In Proceedings of the ICCV, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2921–2930. [Google Scholar]
Gu, J.; Wu, B.; Fan, L.; Huang, J.; Cao, S.; Xiang, Z.; Hua, X. Homography loss for monocular 3d object detection. In Proceedings of the CoRR, New Orleans, LN, USA, 19–24 June 2022; pp. 1080–1089. [Google Scholar]
Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.E. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef]
Zhu, X.; Yin, Z.; Shi, J.E. Generative adversarial frontal view to bird view synthesis. In Proceedings of the 3DV, Verona, Italy, 5–8 September 2018; pp. 454–463. [Google Scholar]
Jiongjiong, G. Cloud Computing Architecture Technology and Practice; Tsinghua University Press: Beijing, China, 2014. [Google Scholar]
Hairong, Z.; Jing, G. Cloud Computing Technology and Application. In Proceedings of the Inner Mongolia Communication; No. 1–2; Inner Mongolia Branch of China Unicom, Black Mountain Branch of China Unicom: Beijing, China, 2014; pp. 106–110. [Google Scholar]
Luo, J.-Z.; Jia-Hui, J.; Song, A.-B.; Dong, F. Cloud computing: Architecture and key technologies. J. Commun. 2011, 32, 3–21. [Google Scholar]
Buyya, R.; Yeo, C.S.; Venugopal, S.; Broberg, J.; Brandic, I. Cloud Computing and Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility. Future Gener. Comput. Syst. 2009, 25, 599–616. [Google Scholar] [CrossRef]
Yu, H.; Jun, Z.; Wenxin, H. Ant colony optimization resource allocation algorithm based on cloud computing environment. J. East China Norm. Univ. 2010, 2010, 127–134. [Google Scholar]
Zimmerman, J.B.; Pizer, S.M.; Staab, E.V.; Perry, J.R.; McCartney, W.; Brenton, B.C. An evaluation of the effectiveness of adaptive histogram equalization for contrast enhancement. IEEE Trans. Med. Imaging 1988, 7, 304–312. [Google Scholar] [CrossRef]
Wang, Q.; Ward, R.K. Fast image/video contrast enhancement based on weighted thresholded histogram equalization. IEEE Trans. Consum. Electron. 2007, 53, 757–764. [Google Scholar] [CrossRef]
Yang, S.; Oh, J.H.; Park, Y. Contrast enhancement using histogram equalization with bin underflow and bin overflow. In Proceedings of the 2003 International Conference on Image Processing, Barcelona, Spain, 14–17 September 2003; pp. 881–884. [Google Scholar]
Kim, Y.T. Contrast enhancement using brightness preserving bi-histogram equalization. IEEE Trans. Consum. Electron. 1997, 43, 1–8. [Google Scholar]
Wan, Y.; Chen, Q.; Zhang, B.M. Image enhancement based on equal area dualistic sub-image histogram equalization method. IEEE Trans. Consum. Electron. 1999, 45, 68–75. [Google Scholar]
Chen, S.; Ramli, A. Minimum mean brightness error bi-histogram equalization in contrast enhancement. IEEE Trans. Consum. Electron. 2003, 49, 1310–1319. [Google Scholar] [CrossRef]
Kim, W.K.; You, J.M.; Jeong, J. Contrast enhancement using histogram equalization based on logarithmic mapping. Opt. Eng. 2012, 51, 067002. [Google Scholar] [CrossRef]
Celik, T. Spatial entropy-based global and local image contrast enhancement. IEEE Trans. Image Process. 2014, 23, 5298–5308. [Google Scholar] [CrossRef]
Lin, N. Wavelet transform and image processing. Hefei: Univ. Sci. Technol. China Press 2010, 6, 151–152. [Google Scholar]
Ding, X. Research on Image Enhancement Based on Wavelet Transform. Master’s Thesis, Anhui University, Hefei, China, 2010. [Google Scholar]
Demirel, H.; Anbarjafari, G. Image resolution enhancement by using discrete and stationary wavelet decomposition. IEEE Trans. Image Process. 2011, 20, 1458–1460. [Google Scholar] [CrossRef] [PubMed]
Demirel, H.; Anbarjafari, G. Discrete wavelet transform based satellite image resolution enhancement. IEEE Trans. Geosci. Remote Sens. 2011, 49, 1997–2004. [Google Scholar] [CrossRef]
Łoza, A.; Bull, D.R.; Hill, P.R.; Achim, A.M. Automatic contrast enhancement of low-light images based on local statistics of wavelet coefficients. Digit. Signal Process. 2013, 23, 1856–1866. [Google Scholar]
Cho, D.; Bui, T.D. Fast image enhancement in compressed wavelet domain. Signal Process. 2014, 98, 295–307. [Google Scholar] [CrossRef]
Nasri, M.; Pour, H.N. Image denoising in the wavelet domain using a new adaptive thresholding function. Neurocomputing 2009, 72, 1012–1025. [Google Scholar] [CrossRef]
Chang, S.; Yu, B.; Vetterli, M. Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Process. 2000, 9, 1532–1546. [Google Scholar] [CrossRef] [PubMed]
Bhandari, A.K.; Kumar, A.; Singh, G.K. Improved knee transfer function and gamma correction based method for contrast and bringtness enhancement of satellite image. Int. J. Electron. Commun. 2015, 69, 579–589. [Google Scholar] [CrossRef]
Se, E.K.; Jong, J.J.; Il, K.E. Image contrast enhancement using entropy scaling in wavelet domain. Signal Process. 2016, 127, 1–11. [Google Scholar]
Demirel, H.; Ozcinar, C.; Anbarjafari, G. Satellite image contrast enhancement using discrete wavelet transform and singular value decomposition. IEEE Geosci. Remote Sens. Lett. 2010, 7, 333–337. [Google Scholar] [CrossRef]
Bhutada, G.G.; Anand, R.S.; Saxena, S.C. Edge preserved image enhancement using adaptive fusion of miages denoised by wavelet and cruvelet transform. Digit. Signal Process. 2011, 21, 118–130. [Google Scholar] [CrossRef]
Bhat, P.; Zitnick, C.L.; Cohen, M.; Curless, B. Gradient Shop: A gradient-domain optimization framework for image and video filtering. ACM Trans. Graph. 2010, 29, 1–14. [Google Scholar] [CrossRef]
Bhat, P.; Curless, B.; Cohen, M.; Zitnick, C.L. Fourier analysis of the 2D screened poisson equation for gradient domain problems. In Proceedings of the European Conference on Computer Vision, Marseille, France, 12–18 October 2008; pp. 114–128. [Google Scholar]
Kong, P. Research on Image Denoising and Enhancement Based on Partial Differential Equations. Master’s Thesis, Nanjing University of Science and Technology: Nanjing, China, 2012. [Google Scholar]
Wang, S. Research on Image Enhancement Technology Based on Partial Differential Equations. Master’s Thesis, Changchun University of Science and Technology, Changchun, China, 2012. [Google Scholar]
Chao, W. Research on Image Processing Techniques Based on Variational Problems and Partial Differential Equations. Ph.D. Thesis, University of Science and Technology of China, Hefei, China, 2007. [Google Scholar]
Chan, T.F.; Shen, J.H. Image Processing and Analysis; Chen, W.B.; Cheng, J., Translators; Science Press: Beijing, China, 2011. [Google Scholar]
Kim, J.H.; Kim, J.H.; Jung, S.W.; Noh, C.K.; Ko, S.J. Novel contrast enhancement scheme for infrared image using detail-preserving stretching. Opt. Eng. 2011, 50, 077002. [Google Scholar] [CrossRef]
Xizhen, H.; Jian, Z. Enhancing image texture and contrast with Partial Differential Equation. Opt. Precis. Eng. 2012, 20, 1382–1388. [Google Scholar]
Land, E. The Retinex. Am. Sci. 1964, 52, 247–264. [Google Scholar]
Land, E.; Mccann, J. Ligntness and Retinex theory. J. Opt. Soc. Am. 1971, 61, 1–11. [Google Scholar] [CrossRef]
Gonzales, A.M.; Grigoryan, A.M. Fast Retinex for color image enhancement: Methods and algorithms. SPIE 2015, 9411, 129–140. [Google Scholar]
Shen, C.T.; Hwang, W.L. Color image enhancement using Retinex with robust envelope. In Proceedings of the 16th IEEE International Conference on Image Process, Cairo, Egypt, 7–10 November 2009; pp. 3141–3144. [Google Scholar]
Wharton, E.; Panetta, K.; Agaian, S. Human visual system-based image enhancement and logarithmic contrast measure. IEEE Trans. Syst. Man Cybern. Part B-Cybern. 2008, 38, 174–188. [Google Scholar]
Jobson, D.; Rahman, Z.; Woodell, G. A multiscale Retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans. Image Process. 1997, 6, 965–976. [Google Scholar] [CrossRef]
Zhao, H.; Xiao, C.; Yu, J.; Bai, L. Retinex nighttime color image enhancement under Markov random field model. Opt. Precis. Eng. 2014, 22, 1048–1055. [Google Scholar] [CrossRef]
Fu, X.; Lin, Q.; Guo, W.; Huang, Y.; Zeng, D.; Ding, X. A novel Retinex algorithm based on alternation direction optimization. In Proceedings of the Sixth International Symposium on Precision Mechanical Measurements, Guiyang, China, 8–12 August 2013; pp. 761–766. [Google Scholar]
Chang, H.B.; Ng, M.K.; Wang, W.; Zeng, T. Retinex image enhancement via a learned dictionary. Opt. Eng. 2015, 54, 013107. [Google Scholar] [CrossRef]
Shi, W.; Pallis, G.; Xu, Z. Edge Computing [Scanning the Issue]. Proc. IEEE 2019, 107, 1474–1481. [Google Scholar] [CrossRef]
Saidi, K.; Bardou, D. Task scheduling and VM placement to resource allocation in cloud computing: Challenges and opportunities. Clust. Comput. 2023, 26, 3069–3087. [Google Scholar] [CrossRef]
Alkhanak, E.N.; Lee, S.P. A hyper-heuristic cost optimization approach for scientific workflow scheduling in cloud computing. Futur. Gener. Comput. Syst. 2018, 86, 480–506. [Google Scholar] [CrossRef]
Gupta, I.; Kaswan, A.; Jana, P.K. A flower pollination algorithm based task scheduling in cloud computing. In Proceedings of the Computational Intelligence, Communications, and Business Analytics: First International Conference, CICBA 2017, Kolkata, India, 24–25 March 2017; Revised Selected Papers, Part II. Springer: Singapore, 2017; pp. 97–107. [Google Scholar]
Mandal, R.; Mondal, M.K.; Banerjee, S.; Srivastava, G.; Alnumay, W.; Ghosh, U.; Biswas, U. MECPVMS: An SLA aware energy-efficient virtual machine selection policy for green cloud computing. Clust. Comput. 2023, 26, 651–665. [Google Scholar] [CrossRef]
Narendrababu Reddy, G.; Phani Kumar, S. Multi objective task scheduling algorithm for cloud computing using whale optimization technique. In Proceedings of the Smart and Innovative Trends in Next Generation Computing Technologies: 3rd International Conference, NGCT 2017, Dehradun, India, 30–31 October 2017; Revised Selected Papers, Part I 3. Springer: Singapore; pp. 286–297. [Google Scholar]
Rimal, B.P.; Maier, M. Workflow scheduling in multi-tenant cloud computing environments. IEEE Trans. Parallel Distrib. Syst. 2016, 28, 290–304. [Google Scholar] [CrossRef]
Zhang, L.; Li, K.; Li, C.; Li, K. Bi-objective workflow scheduling of the energy consumption and reliability in heterogeneous computing systems. Inf. Sci. 2017, 379, 241–256. [Google Scholar] [CrossRef]
Nabi, S.; Ahmad, M.; Ibrahim, M.; Hamam, H. ADPSO: Adaptive PSO-based task scheduling approach for cloud computing. Sensors 2022, 22, 920. [Google Scholar] [CrossRef] [PubMed]
Abed-Alguni, B.H.; Alawad, N.A. Distributed grey wolf optimizer for scheduling of workflow applications in cloud environments. Appl. Soft Comput. 2021, 102, 107113. [Google Scholar] [CrossRef]
Deng, Y.; Chen, Z.; Yao, X.; Hassan, S.; Ibrahim, A.M. Parallel offloading in green and sustainable mobile edge computing for delay-constrained IoT system. IEEE Trans. Veh. Technol. 2019, 68, 12202–12214. [Google Scholar] [CrossRef]
Li, M.; Cheng, N.; Gao, J.; Wang, Y.; Zhao, L.; Shen, X. Energy-efficient UAV-assisted mobile edge computing: Resource allocation and trajectory optimization. IEEE Trans. Veh. Technol. 2020, 69, 3424–3438. [Google Scholar] [CrossRef]
Guo, F.; Zhang, H.; Ji, H.; Li, X.; Leung, V.C. An efficient computation offloading management scheme in the densely deployed small cell networks with mobile edge computing. IEEE/ACM Trans. Netw. 2018, 26, 2651–2664. [Google Scholar] [CrossRef]
Xu, X.; Liu, Q.; Luo, Y.; Peng, K.; Zhang, X.; Meng, S.; Qi, L. A computation offloading method over big data for IoT-enabled cloud-edge computing. Futur. Gener. Comput. Syst. 2019, 95, 522–533. [Google Scholar] [CrossRef]
Huang, L.; Feng, X.; Feng, A.; Huang, Y.; Qian, L.P. Distributed deep learning-based offloading for mobile edge computing networks. Mob. Netw. Appl. 2018, 27, 1123–1130. [Google Scholar] [CrossRef]
Min, M.; Xiao, L.; Chen, Y.; Cheng, P.; Wu, D.; Zhuang, W. Learning-based computation offloading for IoT devices with energy harvesting. IEEE Trans. Veh. Technol. 2019, 68, 1930–1941. [Google Scholar] [CrossRef]
Liu, X.; Yu, J.; Wang, J.; Gao, Y. Resource allocation with edge computing in IoT networks via machine learning. IEEE Internet Things J. 2020, 7, 3415–3426. [Google Scholar] [CrossRef]
Wang, J.; Hu, J.; Min, G.; Zhan, W.; Ni, Q.; Georgalas, N. Computation offloading in multi-access edge computing using a deep sequential model based on reinforcement learning. IEEE Commun. Mag. 2019, 57, 64–69. [Google Scholar] [CrossRef]
Zhang, K.; Zhu, Y.; Leng, S.; He, Y.; Maharjan, S.; Zhang, Y. Deep learning empowered task offloading for mobile edge computing in urban informatics. IEEE Internet Things J. 2019, 6, 7635–76471. [Google Scholar] [CrossRef]
Yu, S.; Wang, X.; Langar, R. Computation offloading for mobile edge computing: A deep learning approach. In Proceedings of the 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Montreal, QC, Canada, 8–13 October 2017; pp. 1–6. [Google Scholar]
Konečný, J.; McMahan, B.; Ramage, D. Federated optimization: Distributed optimization beyond the datacenter. arXiv 2015, arXiv:1511.03575. [Google Scholar]
Liu, K.-H.; Hsu, Y.-H.; Lin, W.-N.; Liao, W. Fine-grained offloading for multi-access edge computing with actor-critic federated learning. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference (WCNC), Nanjing, China, 29 March–1 April 2021; pp. 1–6. [Google Scholar]
Pan, C.; Wang, Z.; Liao, H.; Zhou, Z.; Wang, X.; Tariq, M.; Al-Otaibi, S. Asynchronous federated deep reinforcement learning-based URLLC-aware computation offloading in space-assisted vehicular networks. IEEE Trans. Intell. Transp. Syst. 2022, 24, 7377–7389. [Google Scholar] [CrossRef]
Qu, G.; Wu, H.; Cui, N. Joint blockchain and federated learning-based offloading in harsh edge computing environments. In Proceedings of the International Workshop on Big Data in Emergent Distributed Environments, Virtual Event, China, 20 June 2021; pp. 1–6. [Google Scholar]
Zhang, L.; Jiang, Y.; Zheng, F.-C.; Bennis, M.; You, X. Computation offloading and resource allocation in F-RANS: A federated deep reinforcement learning approach. In Proceedings of the 2022 IEEE International Conference on Communications Workshops (ICC Workshops), Seoul, South Korea, 20 May 2022; pp. 97–102. [Google Scholar]

Figure 1. Based on a direct visual SLAM algorithm.

Figure 2. Flow of keyframe-based visual SLAM algorithm.

Figure 3. Comparison of depth distribution of LSS to OFT.

Figure 4. Cloud platform service hierarchy.

Figure 5. CVT scenario dataset.

Figure 6. NPS drone dataset.

Figure 7. Photo dataset captured by UAV under high-speed movement.

Figure 8. UAV–Human Dataset Diversity dataset.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ren, G.; Chen, F.; Yang, S.; Zhou, F.; Xu, B. Survey on Comprehensive Visual Perception Technology for Future Air–Ground Intelligent Transportation Vehicles in All Scenarios. Eng. Proc. 2024, 80, 50. https://doi.org/10.3390/engproc2024080050

AMA Style

Ren G, Chen F, Yang S, Zhou F, Xu B. Survey on Comprehensive Visual Perception Technology for Future Air–Ground Intelligent Transportation Vehicles in All Scenarios. Engineering Proceedings. 2024; 80(1):50. https://doi.org/10.3390/engproc2024080050

Chicago/Turabian Style

Ren, Guixin, Fei Chen, Shichun Yang, Fan Zhou, and Bin Xu. 2024. "Survey on Comprehensive Visual Perception Technology for Future Air–Ground Intelligent Transportation Vehicles in All Scenarios" Engineering Proceedings 80, no. 1: 50. https://doi.org/10.3390/engproc2024080050

APA Style

Ren, G., Chen, F., Yang, S., Zhou, F., & Xu, B. (2024). Survey on Comprehensive Visual Perception Technology for Future Air–Ground Intelligent Transportation Vehicles in All Scenarios. Engineering Proceedings, 80(1), 50. https://doi.org/10.3390/engproc2024080050

Article Menu