LiDAR and Camera Fusion Approach for Object Distance Estimation in Self ‐ Driving Vehicles

: The fusion of light detection and ranging (LiDAR) and camera data in real ‐ time is known to be a crucial process in many applications, such as in autonomous driving, industrial automation, and robotics. Especially in the case of autonomous vehicles, the efficient fusion of data from these two types of sensors is important to enabling the depth of objects as well as the detection of objects at short and long distances. As both the sensors are capable of capturing the different attributes of the environment simultaneously, the integration of those attributes with an efficient fusion approach greatly benefits the reliable and consistent perception of the environment. This paper presents a method to estimate the distance (depth) between a self ‐ driving car and other vehicles, objects, and signboards on its path using the accurate fusion approach. Based on the geometrical transformation and projection, low ‐ level sensor fusion was performed between a camera and LiDAR using a 3D marker. Further, the fusion information is utilized to estimate the distance of objects detected by the RefineDet detector. Finally, the accuracy and performance of the sensor fusion and distance estimation approach were evaluated in terms of quantitative and qualitative analysis by considering real road and simulation environment scenarios. Thus the proposed low ‐ level sensor fusion, based on the computational geometric transformation and projection for object distance estimation proves to be a promising solution for enabling reliable and consistent environment perception ability for autonomous vehicles.


Introduction
In recent years, many researchers have become interested in multi-sensor data convergence and data integration technologies based on sensors such as cameras, light detection and ranging (LiDAR), and radar as sensing technologies have developed for environment perception [1]. In particular, in robots and autonomous systems, the complementary modality between sensors is an important issue. Especially in intelligent vehicles, object detection must be fast and accurate enough to enable the control system to react and perform the required corrections. However current state-of-the-art literature describes a highly accurate object detection system using vision and range sensing sensors, though there are still several issues to be solved in the context of autonomous driving. For example, detecting the object distance is one limitation of the current object detection approaches. Regarding object distance estimation, there are several approaches that have been proposed, depending on the modality of the sensors used, such as radar, LiDAR, or camera. Each sensor modality is capable of perceiving the environment with a specific perspective and is limited by detecting certain attribute information of objects. More specifically, vision-based approaches are more robust and accurate in object detection but fail in estimating the distance of the object accurately. In contrast, LiDAR-based methods are very robust and accurate in measuring the distance of the object but are limited by the object classification ability. As an alternative, there are a few approaches based on the stereo vision camera because of its ability to perceive the environment in 3D [2]. However, the main constraint is the real-time processing of dense frames and measurement and estimation errors [3]. Hence, considering the ability and limitation of the camera and LiDAR sensor, the sensor-fusion-based approach appears to be ideal for addressing object detection at a distance from the vehicle's current position. In addition, sensor fusion approaches broadly are classified into three main categories based on the different levels of data used for fusion, namely low-level fusion, feature-level fusion, and highlevel fusion [4,5]. In the low-level fusion, the raw measurement from the sensors is fused based on the sensor's physical location, and feature-level fusion extracts the certain feature from the raw measurement through a series of preprocessing. In the high-level fusion, each sensor independently carries out object detection or a tracking algorithm and then performs fusion. Although each fusion approach demonstrates well-known advantages and disadvantages when compared to each other, low-level fusion appears to be more optimal for autonomous vehicles because of its real-time performance and the more precise fusion of data.
However, it is possible to solve the problem of complementary modality and overcome the limitation of each sensor by fusing the advantages of the respective sensors. It is also possible to provide a great deal of information such as type, height, width, and distance of an object in the surrounding environment from the current position by using the fusion data acquired from these sensors [3]. For example, range sensors such as high-speed 3D LiDAR can be used with RGB cameras to enable a robot to perform various detection tasks. More specifically, a 3D LiDAR sensor can provide 3D position and depth information of classified objects, whereas RGB cameras provide 2D position and color information. Therefore, in the real world, the objects can be visualized by mapping information from the 3D position onto the 2D image. However, to fuse the information, it is necessary to find the relative positions and original directions of the sensors. As sensor fusion technology is applied to various fields, the calibration issue between sensors has become increasingly important. In particular, the development of fusion technology using cameras and LiDAR in autonomous vehicles requires an accurate relative position (including posture and direction information) of the camera and LiDAR sensor as an absolute necessity. This can be accomplished by finding the conversion matrix between heterogeneous sensors as extrinsic parameter problems. Therefore, to detect unique features in the point cloud data of LiDAR and the image data of the camera, it is necessary to determine an accurate corresponding relationship between the sensors. In this paper, we propose a precise data fusion method between a camera and a LiDAR sensor to estimate object distance, which is necessary to enable a self-driving vehicle driving along a real road by sensing its surrounding environment more precisely.
In order to estimate the object distance information, first, we consider fusing the data acquired from the camera and LiDAR sensor using the calibration matrix estimated by performing sensors calibration. In the sensor fusion approaches, it is very important to consider the relative position of sensors in order to fuse the data accurately. After sensor data fusion, the distance of an object in the vehicle's path is detected utilizing the object detection output from the camera and distance data from the LiDAR. Hence, the object detection process requires an accurate sensor fusion approach to integrate camera and LiDAR data. For the fusion of camera and LiDAR data, studies have been shown that polygonal planes, such as a checkerboard or box, are necessary to perform the calibration between the camera and LiDAR. However, these methods either cause measurement errors or affect the calibration result whenever there are a few changes such as the plane position or the sensor's relative position. In particular, if the positional change between the sensors is large, it is difficult to accurately detect the edge of the plane and to precisely calibrate the sensors. In addition, remote targets with different shapes and colors may generate different fusion results because of the various experimental environments and characteristics of the sensors. Therefore, in order to improve the accuracy of the calibration, the previously proposed methods are performed by mounting the sensors as close to each other as possible. In addition, it is difficult to apply the previous calibration methods to an autonomous driving system, which requires the recognition of distant objects, because the methods are performed after arranging the sensors closely and detecting the short-distance objects in the field of view. In this paper, correspondence between the sensors positioned in large displacement was established by performing the precise calibration using 3D marker.
However, the calibration approach seems to be more common but there is few research found regarding sensor fusion for object distance estimation, as well as with a limited experimental evaluation. Hence, understanding the necessity of the sensor fusion approach in the context of object distance estimation for the autonomous vehicle, in this article, we present, in detail, a pipeline for object distance estimation in autonomous vehicles by using a sensor fusion approach. It is based on detailed experimental evaluation using a 3D marker-based calibration and sensor-fusion-based distance estimation to recognize short-and long-distance objects, as required by an autonomous driving system. Experimental results and analysis based on the real and simulated environment in the article help the readers to understand the ability of the sensor fusion approach in distance estimation and further direction for improvements. Figure 1 shows the overall process flow of the proposed technology including the data fusion-based depth estimation method and an alignment score-based accuracy evaluation method. Section 2 describes previous sensor fusion methods and related techniques used for object distance estimation, and Section 3 presents our proposed method to fuse the data acquired by the camera and LiDAR of a self-driving vehicle. A detailed description of the approach follows to evaluate the effectiveness of the data fusion is provided. Section 4 discusses the evaluation of the distance estimation method in terms of quantity and qualitative analysis. Additionally, detailed experimental results are presented based on the real and simulated environment to provide insightful information to the readers. Section 5 contains the conclusion and alludes to future work.

Related Work
Autonomous vehicles use various sensors, such as LiDAR, radar, cameras, and ultrasonic sensors, to map and recognize the environment surrounding the vehicle [5]. In this field, sensor fusion techniques, in particular, have been proposed for object detection, and in other areas such as localization and path planning. For instance, conventional LiDAR and radar fusion focus on detecting moving objects such as vehicles and motorcycles. At that time, for the object detection, the velocity is fused using Doppler information extracted from radar and the width and length of vehicles obtained from LiDAR. In the article [6], the authors introduced RoadPlot-DATMO for moving object detection using multiple LiDAR sensors. However, multiple LiDAR sensor-based object detection and tracking were limited by object classification ability. Among the more recent approaches, the stereo-vision based method seems to be more suitable for generic object detection because of its ability to represent a scene in 3D but real-time performance is a critical issue [7]. In addition, the collaborative fusion between the laser scanner and camera approach presented in [8] limited to vehicle detection, though it is difficult to detect static objects such as pedestrians, motorcyclists, and so on.
As an alternative, considering real-time performance, object detection, classification ability, and accurate distance estimation, LiDAR and camera fusion techniques have been introduced for object detection based on different levels of data fusion. The fusion technique establishes a correspondence between the 3D points from LiDAR to the object detected by a camera to reduce the entire processing time. This is possible because sensor fusion compensates for the disadvantages between the sensors and improves the robustness and detection accuracy. For example, the resolution of a typical camera is considerably higher than that of LiDAR, but the camera has a limited field of view and cannot precisely estimate the distance to objects in comparison with LiDAR. In addition, a camera is very sensitive to light changes and has complex image processing in the case of using only images, and LiDAR has difficulty recognizing color and classifying objects in comparison with a camera. However, by using sensor fusion, it is possible not only to acquire complementary information about the environment surrounding an autonomous vehicle by using sensor data with different characteristics, but also to overcome the limitations of each sensor and to reduce the uncertainty of individual sensors. Therefore, an autonomous vehicle inevitably requires sensor fusion technology for safety and reliability. In the paper [3], the author proposed cooperative fusion for multi-objects detection by means of stereo vision and a laser scanner based on feature-level fusion. However, this approach requires sensors to extract the feature from the raw measurement. Another fusion method based on fuzzy logic was proposed in the article [9], which parses the image and point cloud data independently for object detection and performs fusion to get the distance of the object. In this study, sensors data fusion at low-level presented for the robust object detection and the method is verified with the on-road experiment.
In order to fuse the camera and LiDAR data at the low-level, the first essential step of the fusion process is extrinsic calibration between sensors. This means that the geometrical parameters such as the position and orientation of each sensor must be determined by taking into account the relative position and orientation of the other sensors [10][11][12]. Therefore, calibration for fusing sensor data is performed by finding the correspondence between the 3D points and 2D image pixels. The important point for fusing heterogeneous sensor data is to identify the features from each sensor and determine the geometric relation between the sensors [13]. A normal solution is to recognize a target with heterogeneous sensors and then match the data collected from different angles and positions.
In this paper, we focus on precise distance estimation as well as object detection in the path of an autonomous vehicle. In particular, in sensor data fusion, to estimate the depth of moving objects, the accuracy of calibration between sensors is very important. Most calibration methods check the correspondence between the two sensors by using external objects such as a trihedral rig [14][15][16], circles, board patterns [17,18], checkerboard, and others [19][20][21]. Typical methods for fusing the data of a camera with data from other sensors have used a calibration technique using checkerboard patterns. The board is used to fuse data acquired by the optical camera with that from the 2D laser scanner, and the calibration method is based on nonlinear least-squares optimization [22]. To avoid the many errors caused by the checkerboard pattern, a black circular plane board is used, which locates the object by estimating the 3D coordinates of the center of the circle and the vector normal to the plane [23]. In addition, most researchers use the well-known Levenberg-Marquardt algorithm to improve calibration accuracy when transforming LiDAR and camera coordinates [24,25]. Recently, many researchers have also proposed automatic calibration methods [26]. These methods find the target object because each sensor automatically detects the center of the object and circles on a plane. The random sample consensus (RANSAC) algorithm is used to extract the plane, and the iterative closest point (ICP) algorithm based on nonlinear optimization is employed to accurately refine the external parameters for data fusion [16,27]. Another approach to automatic calibration is to use a plane with four circular holes on a white background [28]. This enables 3D LiDAR point cloud data and camera images to detect the four holes automatically but requires high-resolution LiDAR. Another calibration method uses single-scan data of LiDAR and data captured once by the camera [29]. This requires multiple checkerboards and two cameras installed in different positions. An alternative calibration method uses a stereo camera and LiDAR. The method constructs a 3D scene from camera data by using the speeded-up robust features (SURF) algorithm, and this is matched with LiDAR and camera data by applying the ICP algorithm to fuse the data [26,30,31]. These studies have only focused on improving the accuracy of data fusion between sensors for object detection. However, most of the previous methods performed calibration between sensors in a limited space, and these methods have not been shown to accurately perform with experiments in real-time and have not been compared with other methods. In addition, in conventional fusion methods, because calibration and data fusion is performed by mounting the sensors as close to each other as possible to improve the detection accuracy of an object, only objects located within a short distance can be recognized. Therefore, the previous methods are not suitable for autonomous vehicle platforms that need to recognize distant objects and to detect them in real-time.
In this study, we determine 3D marker types by carrying out various experiments to find the types of markers that can fuse the data with high accuracy regardless of the relative positions of the sensors. This 3D marker for detecting remote objects with LiDAR and camera is utilized for precise data fusion in the autonomous vehicle system. Then, we propose a method of fusing the data of the camera and LiDAR for detecting the surrounding objects and a method estimating the distance to objects detected by fusing data on a real road. In addition, we evaluate the performance of the data fusion method in various experiments while driving along an actual road using an autonomous vehicle platform. The experimental results show that the proposed methods can estimate the precise depth of objects and recognize objects accurately. Therefore, we demonstrate that the proposed methods to fuse camera and LIDAR data and to estimate the depth of objects have the potential to be used in a self-driving vehicle.

Sensor Data Fusion for Self-driving Vehicles
This section describes the proposed method for self-driving vehicles in detail. The method perceives its surrounding situation by capturing the various physical attributes of the environment using LiDAR and camera sensors. As shown in Figure 2, the first step of the algorithm involves calibrating the LiDAR and camera sensors to measure the sensor displacement using the 3D marker. Next, based on the calibration parameter and the camera's intrinsic parameter, the LiDAR point cloud is mapped onto the camera image. The accuracy and usefulness of the proposed method are evaluated by using pixel-to-point matching of the sensor data and an alignment score estimation of the matched data.

The Calibration Method of the Camera and LiDAR
The calibration of LiDAR and the camera is performed in two consecutive steps in order to estimate the intrinsic and extrinsic calibration parameters. First, the camera intrinsic parameters are estimated by using the well-known checkerboard for the camera calibration method, and LiDAR and camera extrinsic parameters are obtained by using a planar 3D marker. The actual sensor placement on the roof of the vehicle is graphically illustrated in Figure 3, and the sensors were arranged as shown in Figure 4.  As shown in Figure 5, sensors were mounted rigidly on the top of the vehicle with mutual displacement. A Velodyne HDL 64E-S2 (Velodyne, San Jose, CA, USA) model LiDAR sensor and Sekonix SF3321 model of camera for video streaming were utilized in this work. The image data captured from the camera are represented by using a 2D coordinate system (U,V) and the 3D point cloud generated from the raw measurement of LiDAR sensors is presented by applying a 3D coordinate system (X,Y,Z). The main objective of the camera and LiDAR calibration is to compute the projective transformation matrix, which will project the 3D LiDAR points (X,Y,Z) onto the 2D image (U,V). The projective transformation matrix can be formulated as in Equation (1):

Estimation of Intrinsic and Extrinsic Parameters
In this paper, two different types of marker boards are used for the intrinsic and extrinsic parameter estimation. The internal characteristics of the camera, such as the focal length, skew distortion, and image center are estimated by the camera's intrinsic calibration using the checkerboard marker shown in Figure 5a. This is an essential step in calibrating the LiDAR and camera for data fusion, and the calibration results contain the camera matrix, distortion coefficients, and camera projection matrix.
After the estimation of the intrinsic parameters of the camera, it is necessary to determine the extrinsic parameter; i.e., the six degrees of freedom (6DOF) relative transformation between LiDAR and the camera for data fusion. The extrinsic parameters were determined by using a planar marker board of size 20 × 20 cm and four equally sized circular holes with a size of 22 cm, as shown in Figure  5b. The main objective is to determine the 6DOF transformation parameters by detecting the circles on the marker from both of the sensing devices. Before the marker detection process, it is important to ensure that the marker board is completely within the field of view of both sensors, as described in Figure 6.

Marker detection in the 3D point cloud
The circular holes in the planar marker board were detected in the point cloud data by using the edge detection algorithm described in the literature [26]. Circles can be detected more robustly by using the RANSAC sphere detection algorithm [30]. However, before detecting the circles, the preprocessing of the point cloud is necessary to remove unwanted data from the point cloud obtained from the raw sensor measurement. First, visible point cloud data with respect to the camera's field of view are extracted using the camera's intrinsic parameter I as in Equation (3). Figure 7 shows the visible point cloud highlighted in the RGB intensity values extracted from the camera image.
We consider that the original point cloud Finally, the point cloud of the planar board, from which the straight lines are removed, remains with very few points and is subjected to the RANSAC sphere detection algorithm to detect the four circles, as shown in Figure 8. After detecting the circles from the point cloud, the distance between the centers of adjacent circles and the radii of the circles are verified. The circle detection algorithm terminates if the circles are successfully verified; otherwise, the algorithm returns to the beginning by discarding the currently detected circles. After successfully detecting the four circles, the average radius R is estimated to compute the distance d in Equation (5).

Marker detection in the 2D image
Marker detection in the camera images is performed using the Sobel edge detection operator, after which the circles are detected by using the Hough transform described in [32]. Finally, the four circles are centered, and the radius is verified. Figure 9 presents the circles detected in the camera image. The detected four circles in the image are utilized to estimate the calibration parameters, as explained in the further sections. Detected circles are verified with the pre-defined threshold, and finally, the four valid circles are used to estimate the average radius r, which is used in Equation (5) for the estimation of distance d.

Translation Estimation
In this work, the LiDAR and camera configuration on the vehicle were established in the same orientation of the three axes; hence, the translational difference is much more significant than the rotational difference. The translation difference between the sensors is estimated as follows.
Considering Equation (1) and assuming rotational invariance, the equation can be rewritten as follows, The only unknown variables in Equation (4) are the components of the translation vector ( x y z t t t ). These translation vectors are computed by using the distance measurement based on the circle radius detected on the marker described in the previous section.
Before computing the components of the translational vector x y t and t , z t is computed using the camera focal length f, and circle radius R and r estimated from the point cloud and image, respectively, where d is the distance to the marker, as shown in Figure 10.
where R is the radius of the marker. Considering the distance from LiDAR to marker d1 and from the camera to marker d2 obtained by using Equation (5), the component z t of the translation vector is obtained by using Equation (6) is as follows.
After obtaining z t , the remaining components of the translation vectors can be obtained by using Equation (7).
Finally, the three components of the translation vector ( , , ) x y z t t t t are estimated for all four circles in the camera. The average of these four vectors is used to fuse the LiDAR and camera data to estimate the depth of the object.

Rotation Estimation
After translation estimation, rotation parameters are estimated to increase the accuracy of the calibration parameters. In order to estimate the rotation, we utilized the common least-square bestfitting rigid body transformation. Similar to the 3D marker board detection process, the edges of the  (8).
Next, we compute the weighted centered vectors, Finally, the translation vectors t and r are substituted in Equation (10) to fuse the LiDAR and camera data for the depth estimation.

Fusion of LiDAR and Camera Data
After estimating the intrinsic and extrinsic parameters, the LiDAR data points are projected onto the camera image using Equation (12). An example of this fusion is illustrated in Figure 11. The efficient fusion of data captured from the multiple sensors is essential in order to build a comprehensive perception for the autonomous vehicle. The data fusion process is able to utilize the sensors' redundant information positively to perceive the surrounding environment. Even though camera-based object detection and classification are very robust and efficient, it is difficult to obtain the accurate depth of detected objects. In contrast, the LiDAR sensor proves to be efficient in the estimation of the depth of objects but suffers in the classification of small and distant objects. Figure  11 shows a few scenarios. It is very difficult to estimate the depth of the object using only a camera or LiDAR. Hence, the fusion of the output from the camera and LiDAR greatly benefits object detection and classification in three dimensions. In this paper, the data fusion process is illustrated along with the image-based object detection and classification method presented in [33].
Initially, a raw measurement obtained from the sensor is in a standard polar coordinate form with ( , , ) where, 2D p is the 2D fusion representation of camera and LiDAR data, and i u and i v represent the 2D coordinates, it holds camera intensity and LiDAR point indices values. Figure 11. Point cloud alignment on the image. Figure 11 shows the output of the camera and LiDAR sensor using the calibration parameters along with image-based object detection and classification. The upper images in Figure 11 show all the LiDAR points projected on the image, and the lower images in the figure describe only points within the bounding box of detected objects.

Distance Estimation
The purpose of the sensor fusion approach is to estimate the object distance by utilizing the camera and LiDAR sensor data. Figure 12 presents the overview of the object distance estimation and Figure 13 shows the fusion results of object detection and classification results from the camera and sensor calibration output of LiDAR 3D points. From the sensor fusion result, the depth of the object is calculated from the calibrated LiDAR 3D points aligned inside the bounding box detected from the camera. Object detection in the camera follows the approach presented in the paper [34,35], we have utilized the source code shared on GitHub [36] without any modification.
With the sensor calibration matrix 11  u v is monocular camera frames and i obj is the list of detected objects.
Each detected i obj is represented in the bounding box with as shown in Figure 13. Projected LiDAR points inside the bounding box are extracted to estimate the distance of the detected object using Equation (12).
We consider detected Step by step algorithmic procedure for distance estimation shown in the Algorithm 1.

Experimental Results and Discussion
In this section, we describe the detailed experimental setup and performance evaluation process of the data fusion method within the context of depth sensing for an autonomous vehicle. Based on the current requirements of autonomous vehicles, the data fusion method was evaluated by considering the different aspects of environmental sensing.

Vehicle Platform Setup and Dataset Description
The fusion of LiDAR and camera data for an autonomous vehicle was performed with the hardware setup shown in Figure 14. The autonomous vehicle platform was equipped with a Velodyne HDL 64E-S2 model LiDAR sensor for 3D acquisition and a Sekonix SF3321 camera for video streaming. Both the LiDAR sensor and the camera were aligned and were forward-facing with respect to the vehicle. The positional displacement between the sensors is explained in detail in Figure  4. Prior to fusing the data acquired by the LiDAR and camera sensors, the sensors were calibrated to estimate the intrinsic and extrinsic parameters, as described in Section 3.
The experimental verification was conducted with the custom dataset acquired by utilizing the vehicle platform and the sensor setup described in Figure 14. Multiple on-road scenarios were captured at urban in Hyeonpung-eup, Daegu, Republic of Korea during the summer season in bright and sunny weather at a vehicle speed of 30 to 50 km/h. The camera frame rate was configured to 30 frames per second, whereas the Velodyne LiDAR capture rate was set to 10 frames per second. The images captured by the camera and data acquired by the LiDAR sensor are presented in Figure 15 on the left and right, respectively. In order to evaluate the performance of the data fusion, the dataset was collected by considering the size, type, and varied distance of objects. The ability of the data fusion technique to accurately determine the distance to objects was evaluated by considering factors such as the vehicle speed and operating frequency of each sensor.

Evaluation of Pixel-to-Point Correspondence
Tracking and estimating the distance of stationary and moving objects are a critical functionality of autonomous driving technology. In this evaluation, the accuracy of the data fusion method was verified by considering factors related to the distance of objects. The evaluation results presented in the table are the results of a statistical analysis of the data fusion accuracy when the distance of an object is varied, as shown in Figure 16. Table 1 shows the results of the accuracy analysis of the LiDAR pixel-to-point correspondence process when the distance to objects is varied. In this evaluation, a planar marker board considered as a target object with the dimensions 1 m × 1.5 m was used in three test cases by varying the distance at 20, 40, and 60 m from the vehicle. The corners of the planar marker board were highlighted with green-colored markings for reference in the camera image, and the corresponding corner points in LiDAR were referenced using point cloud indices. After fusing the camera and LiDAR data, each reference point from the camera and the LiDAR points were compared to estimate the accuracy of the alignment. The statistical analysis provided in Table 1 shows that, as distance increases, the pointto-pixel correspondence is difficult to determine because of the lack of points at long range. Sensors with a small vertical angular resolution will be a better choice for estimating the distance of objects at a further distance.

Depth Estimation
The purpose of this evaluation was to verify the performance of the data fusion method within the context of object depth estimation for autonomous vehicles. In this evaluation, we utilized the point cloud aligned camera view for the depth estimation of vehicles in the surrounding environment. The depth of the vehicle was estimated from the points aligned on the image pixel. The depth of the vehicle was estimated by considering the average of points aligned on the image pixels of the specific vehicle. The object detection methodology is not described in this paper because we assumed object detection to be beyond the scope and context of the proposed method.
The results of the statistical analysis of the experiment conducted using the on-road scenarios are presented in Table 2, and screenshots of the corresponding scenes are shown in Figure 17. Four random scenes containing vehicles at various distances were considered for the evaluation. These scenes were captured while the vehicle was traveling at 45 km/h. The scope of this evaluation is to verify the significance of the sensor fusion approach in comparison with the camera or LiDAR-based approach for distance measurement in an autonomous vehicle system, because typical vision-based distance estimation methods are not accurate enough. Even though the LiDAR-based system is very accurate in estimating the distance but they are time-consuming and have a low object classification rate. Hence, in this evaluation significance of the sensor fusion approach for estimating the distance of the objects by fusing the classification ability of camera and accurate distance measurement of LiDAR, is presented with multiple scenarios.  The results of the statistical analysis of the experiment conducted using the on-road scenarios are presented in Table 2 and screenshots of the corresponding scenes are shown in Figure 16. Six random scenes containing vehicles at various distances were considered for the evaluation. Observing the table result, the proposed fusion method was able to estimate the distance of smaller objects p1, p2, p3, and p4 even at a distance of more than 40 m. Also, occluded objects in scene 5 guarantee the distance estimation in the occluded scenario, where the motorbike was partially occluded by the car. Hence, the sensor fusion approach appears to be more reliable than only a camera or LiDAR sensor even for the smaller and occluded objects.

Quantitative Analysis
In this analysis, the performance of the proposed sensor fusion method compared with the exiting single sensor-based object detection methods. The main scope of this analysis is to verify the significance of the sensor fusion approach for object distance estimation. For the experiment, we have used a set of scenarios by varying objects (car and pedestrian) distances at 30, 50, 80, and 100 m as shown in Figure 18. Due to the complexity in estimating the ground truth, the experiment was conducted with a limited number of objects and types (car and pedestrian). If we extend the experimental scenario to more general road conditions, the object detection rate decreases, whereas the accuracy of distance estimation remains unchanged with respect to the number and types of objects. Hence, distance accuracy analysis presented in Tables 3-5 remains the same for even more general road scenarios.   Tables 3-5 present the performance analysis of object detection based on LiDAR, camera, and  proposed fusion sensor method and Table 6 presents a comparison of distance accuracy between the methods. The criteria used for the analysis are detection rate, the distance of, and distance accuracy at different distances. The detection rate is a ratio of the total number of objects detected over the actual number of objects that must be detected. It shows that the detection rate is 100% successful with the specific range for each sensor. For example, LiDAR sensor detection rate 100% up to 50 m, and after that, it fails because of lack of points (sparse), whereas using cameras it is possible to detect the object up to 80 m but distance estimation poor after 60 m.

Accuracy Comparison
The purpose of the data fusion task is to assist the subsequent process in the system, and it is difficult to produce meaningful conclusions by analyzing the accuracy of the data fusion process alone. Thus, in this work, the accuracy of data fusion work is analyzed within the scope of distance estimation in autonomous vehicles. Since it is very hard to find the ground truth reference for real road experiments, we considered the simulator data to estimate the accuracy of distance estimated using the proposed method. In this evaluation, data logging performed in the LGSVL (LG Silicon Valley Lab) from LG Electronics America R&D center, an open-source autonomous vehicle simulation tool [44]. Vehicle setup and sensor placement are modified in the simulation tool to match the real vehicle sensor configuration along with LiDAR and camera specifications. Data logging for the accuracy evaluation conducted in the San Francisco city simulation environment provided with the simulator.
As shown in Figure 19, small and large vehicles and pedestrians are selected as objects in the environment and logging data performed at a speed of 40 km/h. The obtained results are compared with the known ground truth data to measure the error rate as shown in Table 7. The maximum distance error, which was found to be 6 cm at a vehicle distance of more than 60 m. Statistical analysis provided from the real road and simulated environment in Tables 6 and 7 respectively, proves that the proposed method has the potential to be used in autonomous vehicles for the efficient sensing of the environment with multiple sensors.   In this analysis, the quality of the distance estimation approach has been evaluated based on mean squared error (MSE) computed using actual and estimated distance in the simulated environment. The sample data were acquired by considering the objects at different radial distances from 10 m to 80 m. MSE for the estimated object distances is computed by Equation (14).
1 (14) where, is the total number of object in the specific radius r (10,30,50,80), is the actual distance of the object observed in the simulated environment and is the computed distance using the proposed approach. Table 8 depicts the results of an experiment conducted using the simulated environment and proves the average error rate of each radius level increases as the distance increases. However, in comparison to the single-sensor-based approach, our fusion approach of LiDAR and camera is found to be more reliable and reliable up to long distances. Based on the analysis, it is guaranteed that the object distance estimation with the sensor fusion approach is improved in terms of accuracy, assuming the camera detection ability is accurate. Hence, utilizing the sensor fusion approach for object detection and distance estimation will increase the performance in terms of accuracy and reliability.

Time Performance Analysis
The processing time of the proposed depth estimation method mainly depends on the projection of 3D to 2D points using the calibration parameter. In this paper, the evaluation of time performance was conducted using with-GPU (graphics processing unit) and without-GPU platforms. The average results of five iterations are presented in Table 9.

Conclusions
This paper proposed an object distance estimation method for the autonomous system in driverless vehicles. The method is based on fusing data at a low-level from different types of sensors. Specifically, the proposed method utilizes LiDAR and camera data acquired in real-time. The first step in the process is to calibrate the LiDAR and camera sensors and involves the estimation of the extrinsic parameters along with the camera's intrinsic parameters. The camera's intrinsic parameters were estimated by using the traditional checkerboard calibration method, and the LiDAR and camera's extrinsic parameters were estimated using a planar 3D marker board. Next, the LiDAR points are mapped onto the camera image using the calibration parameters. Finally, the output of the LiDAR and camera in the form of fused data was verified with multiple performance evaluation methods. The main advantage of the proposed data fusion approach is the efficient depth estimation process for the autonomous system in driverless vehicles.
In addition, the statistical analysis of the experimental results conducted in real-road scenarios with a custom vehicle setup proves the potential of the proposed method for use with an autonomous system for the depth estimation of vehicles in the surrounding environment. As compared to other data fusion approaches, the proposed method able to accurately estimate the extrinsic parameters and to perform a data fusion and determines the object distance. Future work toward the refinement of the proposed data fusion approach includes point cloud distortion compensation using inertial measurement unit (IMU) sensors and the elimination of the limitation of the operating frequency of the data fusion process by using interpolation methods.