A Novel Multimodal Fusion Framework Based on Point Cloud Registration for Near-Field 3D SAR Perception

: This study introduces a pioneering multimodal fusion framework to enhance near-field 3D Synthetic Aperture Radar (SAR) imaging, crucial for applications like radar cross-section measurement and concealed object detection. Traditional near-field 3D SAR imaging struggles with issues like target–background confusion due to clutter and multipath interference, shape distortion from high sidelobes, and lack of color and texture information, all of which impede effective target recognition and scattering diagnosis. The proposed approach presents the first known application of multimodal fusion in near-field 3D SAR imaging, integrating LiDAR and optical camera data to overcome its inherent limitations. The framework comprises data preprocessing, point cloud registration, and data fusion, where registration between multi-sensor data is the core of effective integration. Recognizing the inadequacy of traditional registration methods in handling varying data formats, noise, and resolution differences, particularly between near-field 3D SAR and other sensors, this work introduces a novel three-stage registration process to effectively address these challenges. First, the approach designs a structure–intensity-constrained centroid distance detector, enabling key point extraction that reduces heterogeneity and accelerates the process. Second, a sample consensus initial alignment algorithm with SHOT features and geometric relationship constraints is proposed for enhanced coarse registration. Finally, the fine registration phase employs adaptive thresholding in the iterative closest point algorithm for precise and efficient data alignment. Both visual and quantitative analyses of measured data demonstrate the effectiveness of our method. The experimental results show significant improvements in registration accuracy and efficiency, laying the groundwork for future multimodal fusion advancements in near-field 3D SAR imaging.


Introduction
Near-field 3D synthetic aperture radar (SAR) imaging can obtain the three-dimensional electromagnetic scattering structure of observed targets and restore their spatial position information, which has become an important trend in the development of SAR [1][2][3][4][5].In recent years, near-field 3D SAR imaging has been increasingly applied in concealed object detection and radar cross-section (RCS) measurement [6].Owing to the capability of working under all-day and all-weather conditions, the near-field 3D SAR system is not only unaffected by environmental factors such as light and smoke, but also able to reconstruct items under clothing or within boxes [7].It is suitable for deployment in airports, highspeed railways, and other occasions for security checks.Compared to microwave anechoic chamber measurement, near-field 3D SAR systems can perform RCS measurement on the target quickly, which is beneficial for radar stealth evaluation and scattering diagnosis [8].
However, near-field 3D SAR encounters several challenges.First, the clutter, multipath interference, and noise mixed in the images obscure target-background differentiation.Second, the presence of sidelobes results in a blurry shape and structure loss of the target, which affects the scattering diagnosis of specific parts in the target.Third, near-field SAR images are limited to capturing scattering intensity and do not provide color or texture information, which complicates the accurate categorization of targets.These limitations lower the quality of perception and hinder subsequent tasks like scattering diagnosis, detection, recognition, and interpretation.
Research into scene perception based on multi-sensor fusion has recently become a hot topic [9][10][11].Multi-sensor fusion can integrate complementary multimodal data to make working conditions broader and obtain more informative fusion results.Existing work has fused 2D SAR images with optical, hyperspectral and infrared images to assist in SAR image interpretation [12][13][14], and been applied in fields such as remote sensing surveys and disaster detection.Yinghui Quan et al. [15] developed a multi-spectral and SAR image fusion method based on weighted median filtering and Gram-Schmidt transform to improve the classification accuracy of land cover.For multi-sensor 3D SAR fusion, Xiaolan Qiu et al. [16] imaged a building using the unmanned aerial microwave vision 3D SAR (MV3DSAR) experimental system and LiDAR, and demonstrated the fusion results of LiDAR point clouds and reconstructed interferometric SAR point clouds, but did not provide relevant registration and fusion methods.It can be seen that research on the fusion of near-field 3D SAR with other heterogeneous sensors is just beginning.
Common sensors include radar, LiDAR, and cameras.LiDAR detects targets using emitted lasers, which can accurately measure distance.The captured laser point cloud can accurately describe the geometric shape, structure, and size of the target.However, its operation is greatly affected by weather, and the laser attenuates severely in environments such as heavy rain, thick smoke, and fog [17].Optical cameras capture visible light reflected on the surface of an object for imaging, which can obtain detailed information such as the color and texture of the object.The resolution of visible light images is high, which is more in line with human cognition.However, they are greatly affected by light during operation, resulting in poor imaging results at night [18].Due to the strong penetration of electromagnetic waves, radar can work in harsh weather, but its imaging resolution is low and lacks details [19].To improve the capabilities of near-field 3D SAR images in scattering diagnosis and detection, this study presents the first research on multimodal fusion with near-field 3D SAR, LiDAR, and optical camera.The interference in SAR images can be suppressed by utilizing LiDAR's precision in target localization and shape description, which helps scattering diagnosis.The color and texture information of optical images can aid in categorizing objects in near-field SAR images, enhancing the perception of a scene.
Multimodal sensing uses heterogeneous sensors to capture more comprehensive scene information, and effectively addresses the afore-mentioned deficiencies by aggregating multi-sensor data through fusion [20].The key to achieving multi-sensor data fusion is to solve the problem of coordinate system alignment.That is to say, to find the relative pose relationship of different coordinate systems.Here, the pose refers to both the position and the orientation of a subject.Two commonly used methods are calibration and registration [21].The calibration method not only requires the manual design of the calibration object, but the object also needs to be recalibrated after the relative pose of the sensor changes, which is not flexible enough [22].Therefore, this study adopts point cloud registration to achieve multimodal data fusion for near-field 3D SAR perception.
The existing point cloud registration research mainly focuses on the problem of homogeneous point cloud registration or LiDAR-Camera point cloud registration, while there is no published research on point cloud registration methods for near-field 3D SAR and other sensors.In 2014, Furong Peng et al. [23] first analyzed the significant differences in point cloud density, sensor noise, scale, and occlusion in multi-sensor point cloud registration, and then proposed a two-stage registration algorithm.By utilizing coarse registration based on the ensemble of shape functions (ESF) descriptor and iterative clos-est point (ICP) [24] fine registration, the registration of LiDAR point clouds and optical structure from motion (SFM) [25] reconstruction point clouds for street buildings was completed.In 2015, Nicolas Mellado et al. [26] proposed a method for registering LiDAR point clouds and optical multi-view stereo (MVS) reconstruction point clouds.This method first achieved scale-invariant matching though the growing least squares descriptor, and then used the random sample consistency (RANSAC) method [27] for spatial transformation.In 2016, Xiaoshui Huang et al. [28] improved on the work of Furong Peng et al. [23] by using an improved generative Gaussian mixture model in the fine registration stage to achieve the high-precision fusion of street view LiDAR and SFM point clouds.In 2017, Xiaoshui Huang et al. [29] applied graphs to describe the structures extracted from multisensor point clouds, and used an improved graph matching method with global geometric constraints to obtain the graph matching results.After that, RANSAC and ICP were used to refine and complete the registration fusion of SFM and Kinect point clouds.In 2021, Jie Li et al. [30] utilized a unified simplified expression of geometric elements in conformal geometry algebra to construct the matching relationship between points and spheres, obtaining a more accurate alignment of LiDAR and Kinect point clouds.
From the above research, we can infer that the ICP algorithm is currently the most widely used point cloud registration method [31].However, the ICP algorithm has strict requirements for the initial pose of the two input point clouds, and it is easy to fall into local optima when there are significant differences in the initial pose.In order to provide a good initial pose for the ICP algorithm, coarse registration algorithms such as the RANSAC method and its variants are generally used for roughly aligning the input point clouds.Currently, multi-source point cloud registration mostly uses this coarse-to-fine registration method [32].
However, the different imaging mechanisms of multiple sensors also pose some challenges to multimodal fusion.Lahat et al. [33] identified the challenges in multimodal data fusion and divided them into two parts: the challenges caused by data collection and the challenges caused by the data source.In the fusion of SAR, LiDAR, and camera, these challenges are manifested as follows: (1) Data format differences-Near-field 3D SAR images are mainly obtained by imaging radar echoes using the back projection (BP) algorithm [34], which is expressed in voxels, while the LiDAR imaging result is the point cloud and the optical camera captures the 2D image.(2) Noise difference-There are clutter and background noise in 3D SAR images.The difference is that LiDAR point clouds and optical reconstructed point clouds have outliers.(3) Resolution difference-The frequency bands of the microwave, laser, and visible light used in SAR, LiDAR, and cameras gradually increase, resulting in the highest resolution of optical images, followed by LiDAR point clouds, and the lowest resolution of SAR images.Due to these challenges, existing point cloud registration methods cannot effectively select corresponding points, making it difficult to achieve efficient and high-precision multimodal data alignment for near-field 3D SAR.
Based on the current state of research, there are no detailed public research results specifically based on the field of 3D SAR, especially near-field 3D SAR, which holds significant application value in areas such as scattering diagnosis and perception.Moreover, the fusion of 3D SAR, LiDAR and camera presents its own unique challenges that are not suitably addressed with the current methods, which are primarily aimed at homogeneous point cloud fusion.Bearing this background in mind, and following the trend of multimodal sensing for 3D SAR, we have decided to conduct a preliminary study in this work.
To address existing challenges, this study develops a novel multimodal fusion framework for near-field 3D SAR, consisting of data preprocessing, point cloud registration, and data fusion.For preprocessing, 3D SAR images are converted into point clouds and optical point clouds are reconstructed using SFM, thus standardizing the data format.This is followed by noise removal and target feature extraction from the multimodal data.For registration, LiDAR point clouds, known for their precise positioning and shape accuracy, act as an intermediate bridge for SAR-LiDAR and LiDAR-Camera pairwise registration to achieve the spatial alignment of all three sensors.The final fusion step integrates multimodal data of varying resolutions by adding optical color textures and SAR scattering intensity to the LiDAR point clouds.
The registration process introduces a three-stage multi-sensor point cloud registration method, comprising key point extraction, coarse registration, and fine registration.Initially, a centroid distance (CED) key point extraction method with dual constraints of geometric structure and intensity is used to extract key points from the point cloud.Next, the method employs a sample consensus initial alignment (SAC-IA) coarse registration method with mixed constraints of geometric triangulation and a signature of histogram of orientation (SHOT) feature to achieve the initial pose transformation.The final step, based on the initial pose transformation, applies an adaptive-thresholding ICP fine registration algorithm for precise pose adjustment.The method enhances registration efficiency by key point extraction, and eliminates point cloud heterogeneity and uses multiple constraint terms constructed based on prior knowledge to improve registration accuracy.Through the above point cloud registration methods, the proposed multimodal data fusion framework achieves LiDAR-SAR point cloud registration and LiDAR-Camera point cloud registration, respectively, to obtain aligned SAR-LiDAR-Camera three-sensor data.After that, the nearest neighbor search algorithm is used to remove the redundancy of SAR point clouds, and the multi-sensor point cloud fusion results are obtained.The experimental data were captured by our prototype hardware system, and the processing results demonstrate the fusion of near-field 3D SAR with LiDAR and optical cameras, while verifying the effectiveness of the proposed point cloud registration method and multimodal fusion framework.
Our main contributions are as follows: • This work presents the first attempt to enhance the perception quality of near-field 3D SAR imaging from a multi-sensor data fusion perspective, uniquely combining nearfield 3D SAR with LiDAR and optical camera data to address the inherent limitations;

•
This work designs a multimodal fusion framework for effectively integrating data from near-field 3D-SAR, LiDAR, and a camera, which consists of three main components-data preprocessing, point cloud registration, and data fusion; • This work introduces a novel three-stage registration algorithm tailored to overcome the heterogeneity issue across sensors.This algorithm includes-(1) a new key point extraction method that improves the CED algorithm with structure-intensity dual constraints, (2) an enhanced coarse registration technique that integrates geometric relationship and SHOT feature constraints into SAC-IA for improved initial alignment, and (3) an adaptive-thresholding ICP fine registration algorithm for precise fine registration; • This work validates the proposed approach using data collected from our SAR-LiDAR- Camera prototype system.The experimental results demonstrate obvious improvements in registration accuracy and efficiency over existing methods.The quantitative and qualitative results underscore the effectiveness of our multi-modal fusion approach in overcoming the inherent limitations of near-field 3D-SAR imaging.
The rest of this paper is organized as follows: Section 2 provides a description regarding the materials adopted, including the system and the collected data.The specific framework for the fusion of data from SAR, LiDAR and the camera is presented in Section 3. Section 4 describes the experimental results and gives a discussion of the proposed framework.Finally, we summarize the paper in Section 5 and provide some prospects for future work.

Materials
The proposed framework is designed for near-field 3D SAR perception.The near-field 3D SAR is a type of radar imaging system that actively transmits electromagnetic waves to the observed target.These transmitted waves are often in the X band for applications like scattering diagnosis, and the W band for applications like person screening.The corresponding wavelength ranges from the level of cm to mm.Objects under these bands present differences from human visual perception.For instance, some parts of the target might appear missing, as seen in the head of the aircraft models in Figure 1a,b.The resolution is also limited, making the grid on the surface of the satellite model appear ambiguous.Furthermore, the color of the radar image, which reflects the scattering intensity of the target, varies significantly from visual perception.These limitations make scattering diagnosis, detection, recognition, and interpretation challenging.Compared to radar, other sensors like LiDAR and cameras can supplement information.LiDAR is an active sensing method that uses a much higher frequency of electromagnetic wave and a much shorter wavelength, like 905 nm in our prototype system, achieving higher resolution as shown in Figure 2. The camera sensor is a passive sensing method that relies on the illuminated and reflected light on the object.The related electromagnetic wave lies in the spectrum of visible light, with a wavelength range between 380 and 700 nm.The resulting optical image provides color information, revealing the texture of the object in line with our visual perception, as shown in Figure 3.By fusing this additional information, the radar image (the near-field 3D SAR image) can be perceived more easily and comprehensibly.This relies on the accurate fusion framework detailed in the next section.
In the data capture system, the millimeter wave near-field array 3D SAR imaging system serves to obtain near-field 3D SAR images, the Spedal monocular camera captures multi-view optical images and Livox Avia LiDAR acquires LiDAR point clouds.As the scanning time increases, the density of the Livox LiDAR point cloud increases, and the final point cloud obtained clearly shares the shape contours of the target.The imaging resolution of the Spedal monocular camera is 1920 × 1080 pixels.
Figure 4 shows the experiment scene of the near-field array 3D SAR system.By moving the RF module on the horizontal and vertical rails, horizontal and vertical two-dimensional scanning is completed, and the virtual synthetic aperture is formed.The center frequency of the system's transmission signal is 78.8 GHz, with a maximum transmission signal bandwidth of 4 GHz.The array length of the system is 0.4 m × 0.4 m and the operating distance is 1 m.The range resolution can reach up to 3.75 cm, and the azimuth and altitude resolution can reach the millimeter level.The size of the 3D SAR image in the range, azimuth and height directions is 256 × 408 × 200.
Experiments have been conducted using multi-source data collected from four targets: aircraft model 1, aircraft model 2, pincer, and satellite model.Figure 2 shows the scene of near-field 3D SAR image acquisition, the original near-field 3D SAR imaging results, and the results obtained through the near-field SAR preprocessing process detailed in Section 3.1.1.Figure 3    tion as shown in Figure 2. The camera sensor is a passive sensing method that relies on the illuminated and reflected light on the object.The related electromagnetic wave lies in the spectrum of visible light, with a wavelength range between 380 and 700 nm.The resulting optical image provides color information, revealing the texture of the object in line with our visual perception, as shown in Figure 3.By fusing this additional information, the radar image (the near-field 3D SAR image) can be perceived more easily and comprehensibly.This relies on the accurate fusion framework detailed in the next section.In the data capture system, the millimeter wave near-field array 3D SAR imaging system serves to obtain near-field 3D SAR images, the Spedal monocular camera captures multi-view optical images and Livox Avia LiDAR acquires LiDAR point clouds.As the scanning time increases, the density of the Livox LiDAR point cloud increases, and the final point cloud obtained clearly shares the shape contours of the target.The imaging  Experiments have been conducted using multi-source data collected from four targets: aircraft model 1, aircraft model 2, pincer, and satellite model.Figure 2 shows the scene of near-field 3D SAR image acquisition, the original near-field 3D SAR imaging results, and the results obtained through the near-field SAR preprocessing process detailed in Section 3.1.1.Figure 3

Methodology
The overall flowchart of the proposed near-field SAR multimodal fusion framework is shown in Figure 5.The framework consists of data preprocessing, point cloud registration, and data fusion.In the data preprocessing stage, near-field 3D SAR imaging, LiDAR imaging, and optical 3D reconstruction are first performed independently on measured data obtained from corresponding sensors.Then, filtering operations are used to denoise the random or spurious points.After filtering, the down-sampling reduces point density, enables uniformity and produces computational efficiency.Point clouds generated by Li-DAR and optical sensors can be extremely dense.Down-sampling reduces the number of points in the cloud, making it more manageable for subsequent processing steps.It helps in achieving uniform point densities across the entire point cloud, ensuring that there are no areas with excessively high point density or gaps.Besides, processing and analyzing dense point clouds requires significant computational resources.Down-sampling reduces the computational burden by reducing the number of points while still retaining essential spatial information.Finally, segmentation operations are used to extract the target point cloud.As for the point cloud registration stage, a novel three-stage registration method including key point extraction, coarse registration, and fine registration is performed to obtain the pose transformation matrix of multi-sensor point clouds.This stage is the core of our fusion framework.It will be introduced and explained in detail in the next section.In the data fusion stage, the pose transformation matrix is used to align three point clouds, and the nearest neighbor point search algorithm is used to remove the redundancy of SAR to obtain the multi-sensor point cloud fusion result.

Methodology
The overall flowchart of the proposed near-field SAR multimodal fusion framework is shown in Figure 5.The framework consists of data preprocessing, point cloud registration, and data fusion.In the data preprocessing stage, near-field 3D SAR imaging, LiDAR imaging, and optical 3D reconstruction are first performed independently on measured data obtained from corresponding sensors.Then, filtering operations are used to denoise the random or spurious points.After filtering, the down-sampling reduces point density, enables uniformity and produces computational efficiency.Point clouds generated by LiDAR and optical sensors can be extremely dense.Down-sampling reduces the number of points in the cloud, making it more manageable for subsequent processing steps.It helps in achieving uniform point densities across the entire point cloud, ensuring that there are no areas with excessively high point density or gaps.Besides, processing and analyzing dense point clouds requires significant computational resources.Down-sampling reduces the computational burden by reducing the number of points while still retaining essential spatial information.Finally, segmentation operations are used to extract the target point cloud.As for the point cloud registration stage, a novel three-stage registration method including key point extraction, coarse registration, and fine registration is performed to obtain the pose transformation matrix of multi-sensor point clouds.This stage is the core of our fusion framework.It will be introduced and explained in detail in the next section.In the data fusion stage, the pose transformation matrix is used to align three point clouds, and the nearest neighbor point search algorithm is used to remove the redundancy of SAR to obtain the multi-sensor point cloud fusion result.The preprocessing of the near-field 3D SAR image to extract targets is shown in Figure 6a.The near-field 3D SAR image is generated using the BP algorithm.It is then converted into a point cloud format through global threshold filtering.Here, any pixel in the 3D image grid above the threshold is retained as a point in the point cloud.The threshold is set based on the specific dynamic range required.Based on the approximate position of the target in the observation scene, points outside the imaging area of the SAR point cloud are removed through passthrough filtering.Then, threshold extraction is performed to filter out low scattering background noise and interference by setting the absolute value of SAR scattering intensity.The sidelobes in the near-field 3D SAR point cloud are significant, and they blur the true distance and shape of the target and bring outliers to registration.Therefore, the sidelobes are removed by taking the maximum scattering intensity in the distance direction, and the main lobes are retained.Next, statistical filtering is used to process the near-field 3D SAR point cloud to remove discrete strong scattering noise, which will affect the subsequent point cloud feature calculation and registration.Finally, Euclidean distance clustering segmentation [35] is used to extract the target point cloud.After filtering out the noise, the near-field SAR point cloud is sparsely distributed in space, which is suitable for Euclidean distance segmentation.

LiDAR Preprocessing
The preprocessing of the LiDAR point cloud to extract targets is shown in Figure 6b.The points outside the target area in the LiDAR point cloud are filtered out through passthrough filtering to reduce the size of point cloud.Then, the LiDAR point cloud is processed through octree voxel down-sampling to facilitate the further correspondence search with the voxel-transformed SAR point cloud.Due to the octree voxel down-sampling method retaining the centroid of the voxel grid as the sampling point, rather than the points in the original point cloud, the detailed features of the point cloud are destroyed.Therefore, the process selects the point closest to the centroid of the voxel grid in the original point cloud as the sampling point.Next, statistical filtering is used to filter out outliers and noise in the LiDAR point cloud.Finally, the M-estimator sample consensus (MSAC) algorithm [36] is used to obtain the platform plane where the target is located and remove it.The LiDAR point cloud of the target is segmented using the Euclidean distance clustering segmentation method mentioned in Section 3.1.1.

LiDAR Preprocessing
The preprocessing of the LiDAR point cloud to extract targets is shown in Figure 6b.The points outside the target area in the LiDAR point cloud are filtered out through passthrough filtering to reduce the size of point cloud.Then, the LiDAR point cloud is processed through octree voxel down-sampling to facilitate the further correspondence search with the voxel-transformed SAR point cloud.Due to the octree voxel down-sampling method retaining the centroid of the voxel grid as the sampling point, rather than the points in the original point cloud, the detailed features of the point cloud are destroyed.Therefore, the process selects the point closest to the centroid of the voxel grid in the original point cloud as the sampling point.Next, statistical filtering is used to filter out outliers and noise in the LiDAR point cloud.Finally, the M-estimator sample consensus (MSAC) algorithm [36] is used to obtain the platform plane where the target is located and remove it.The LiDAR point cloud of the target is segmented using the Euclidean distance clustering segmentation method mentioned in Section 3.1.1.

Camera Preprocessing
The preprocessing of the multi-view optical images to extract targets is shown in Figure 6c.Multi-view optical images are reconstructed through the SFM algorithm to obtain optical point clouds.And the point cloud of target area is obtained by passthrough filtering.Then, the optical point cloud is down-sampled using the octree voxel downsampling method described in Section 3.1.2,which reduces the size of the point cloud while preserving the target structural features and overcoming the resolution differences with LiDAR and SAR point clouds.Next, the process includes statistical filtering on the optical point cloud to remove outliers generated during SFM reconstruction.Finally, a color-based region growth segmentation method [37] is used to segment the target.Optical point clouds have abundant color and texture information.The color-based region growth segmentation method utilizes color differences between points for clustering, which can effectively segment optical point clouds.

Basic Principles of Point Cloud Registration
Point cloud registration aligns the coordinate systems of two input point clouds by solving the spatial transformation matrix between them.A point cloud is a collection of points.We assume the two input point clouds are , where x i and y i are the coordinates of the ith points in the point clouds X and Y, respectively.Suppose X and Y have Z pairs of correspondences, where the corresponding point set is D = {(x 1 , y 1 ), • • • , (x Z , y Z )}.The spatial transformation matrix includes rotation, translation, and scaling transformations.The rotation transformation includes pitch angle, yaw angle, and roll angle, the translation transformation includes the translation of three coordinate axes, and the scaling transformation includes one scaling factor, which are represented as the rotation matrix R ∈ R 3×3 , translation vector t ∈ R 3 , and scaling factor f s , respectively.Scaling is not considered in rigid registration, so scaling factor f s is ignored.The goal of registration is to find the rigid transformation parameters R and t that best align the point cloud X to Y, as shown below: where ∥x k − (Ry k + t)∥ 2  2 is the projection error of the kth corresponding point between X and transformed Y.By solving the above optimization problem to minimize the position error between the two point clouds, the optimal spatial transformation matrix (R and t) is obtained.When the corresponding points between two point clouds are obtained, singular value decomposition (SVD decomposition) is usually used to solve the transformation matrix [38].
Traditional registration methods use optimization strategies to estimate the transformation matrix.The most commonly used optimization-based registration method is the ICP algorithm, which contains two stages: correspondence searching and transformation estimation.Correspondence searching is intended to find the matched point for the input point clouds.Transformation estimation is used to estimate the transformation matrix via the correspondences.These two stages will be conducted iteratively to find the optimal transformation.If the initial pose differences of input point clouds are significant, the ICP algorithm struggles to find precise correspondences during the iterative process, and its estimated transformation matrix is also inaccurate.The two-step registration method is then adopted in homogeneous point cloud registration, which roughly aligns the point cloud pose through coarse registration.However, the different imaging mechanisms of multiple sensors pose challenges to multimodal point cloud registration in terms of data format, noise, and resolution differences.Therefore, this study proposes a multi-sensor point cloud registration method that involves three stages of key point extraction, coarse registration, and fine registration to achieve high-precision multi-source point cloud registration.

Key Point Extraction with Structure-Intensity Constraints
Key points are points in the point cloud that have significant features, including geometric structure, color, and intensity, which can effectively describe the original point cloud.Compared to the original point cloud, the number of key points is relatively small.In addition, as the points' relative positions remain unchanged during point cloud rotation and translation, the extracted key points have rotational and translational invariance.Therefore, key points can be utilized to replace the original point cloud for registration.
Using key points for point cloud registration can preserve point cloud features, eliminate multimodal point cloud heterogeneity and improve registration efficiency.
The near-field SAR and LiDAR point cloud contain the spatial position coordinates and intensity information of points.Most existing key point detectors often focus on extracting key points from a single feature, which reduces the description ability of the extracted key points.Note that the centroid-distance (CED) detector [39] has been recently proven to be more effective.Although the CED detector is a multi-feature key point detector that can extract geometric structure and color key points from color point clouds, it does not focus on the extraction of intensity key points, and so cannot be used for SAR and LiDAR.Therefore, this study designs a novel detector based on the CED detector that can extract geometric structure and intensity key points, enabling the key point extraction of near-field SAR and LiDAR point clouds.
Specifically, the process of our key point extraction is to calculate the significance of each point in the point cloud in its sphere neighborhood, and then retain key points with higher significance compared to all neighboring points in the sphere neighborhood through non-maximum suppression, whereby the significance refers to geometric structure and intensity.Assuming there is a LiDAR point cloud set Q and q = q G , q S T is one of the points, q G = {x, y, z} is the geometric coordinate of point q, and q S is the intensity of point q.We set point q as the query point and r as the radius of the spherical neighborhood, ad search for all points within the spherical neighborhood to form the set of neighboring points N q = q i | q G − q G i 2 < r for point q.The first step is to calculate the geometric significance and intensity significance of the points.The geometric centroid of the spherical neighborhood of point q can be obtained by the following equation: where I is the number of neighboring points of point q.The intensity centroid of the spherical neighborhood of point q can be obtained by the following equation: Intuitively, the larger the distance from the point to the geometric centroid, the more prominent the geometrical significance, such as corner points.And the greater the intensity difference between the point and the intensity centroid, the more prominent the intensity significance.Therefore, the geometric significance of point q is measured by its distance from the geometric centroid of its sphere, as follows: The intensity significance of point q is represented by the L1 norm of its intensity and the intensity centroid of its sphere neighborhood, as follows: The second step is to obtain key points with high significance.We traverse all points in the point cloud Q, and filter out the point with low significance using Equation (6).
where d Gt is the geometric significance threshold and d St is the intensity significance threshold.In order to select points with high geometric and intensity significance within the sphere neighborhood, the non-maximum suppression algorithm [40] is used for screening the key points that meet Equation (7), where d Gi is the geometric significance threshold and d Si is the intensity significance threshold of neighboring point q i .

SAC-IA Coarse Registration with SHOT Feature and Geometric Relationship Constraints
After extracting the key points of the near-field SAR and LiDAR point cloud, coarse registration can be performed using these key points to give a good initial pose between the input point clouds.Enhancements to the correspondence searching process of the original SAC-IA algorithm [41] come through the signature of the histogram of orientation (SHOT) feature descriptor [42] and geometric relationship constraints for better SAR-LiDAR coarse registration.The original SAC-IA algorithm only relies on the fast point feature histogram (FPFH) feature descriptors [43] to select correspondences, without considering the geometric relationship between correspondences.When the corresponding points are incorrectly selected, a problem of ambiguous rotation angles arises.Such ambiguity caused by the three collinearity points can be overcome by triangular relationships constraints.Furthermore, compared to the SAC-IA algorithm using the FPFH feature to describe the features of points, SHOT feature descriptors are more robust to point clouds with incomplete surfaces and uneven density.Therefore, on the basis of the original SAC-IA, this study uses both the SHOT features and triangular relationships of the corresponding points to constrain the correspondence search.
Before executing our improved coarse registration algorithm, we should calculate the SHOT feature descriptor of the key points.The SHOT feature descriptor uses the adjacent points to encode the key points and obtain the corresponding feature vectors.SHOT features have rotation and translation invariance and can be used for correspondence selection in point cloud registration.The steps for constructing SHOT feature descriptors are as follows.
First, we build a unique coordinate system centered around key points.For the key point q ′ ∈ Q, we construct the covariance matrix E S of point q ′ in a spherical neighborhood space with a search radius of r s via the following equations, where q ′ = {x ′ , y ′ , z ′ } only denotes the geometric coordinate of the key point.
where N q ′ represents the set of all points within the spherical neighborhood of point q ′ and q ′ i ∈ N q ′ and ∥•∥ 2 is the matrix L2 norm.Eigenvalue decomposition is performed on the covariance matrix E S to obtain the corresponding unit eigenvector x + , y + , z + in the order of decreasing eigenvalues.The unit vector in the opposite direction is x − , y − , z − .G(k) represents the index set of k points in the spherical neighborhood space that are closest to the median distance d m = median q ′ i − q ′ 2 , i ∈ 1, N q ′ from point q ′ .In order to eliminate the symbol ambiguity caused by eigenvalue decomposition in constructing the unique coordinate system, the following steps are performed.The positive direction of the X-axis for resolving ambiguity can be obtained using the following equation.
The above similar Equations ( 10)-( 14) can be used to determine the positive direction of the local coordinate system's Z-axis, and we can then obtain the positive direction of the local coordinate system's Y-axis through y = z × x.
Second, we encode adjacent points based on the unique coordinate system above to obtain SHOT features.Point q ′ is the origin of the unique coordinate system, and its spherical neighborhood space is divided into two parts along the radial direction, eight parts in the vertical direction, and two parts in the horizontal direction, resulting in a total of 32 feature subspaces.Equation ( 15) is used to calculate the cosine value of the angle θ i between the unit normal vector → n i of the adjacent point q ′ i falling into the region and the positive direction → z q ′ of the unique coordinate system's Z-axis in each subspace.
In each subspace, the cosine value is divided into 11 parts to form a local histogram, and the adjacent points are classified into different cells of the local histogram based on the cosine value.After the local histograms of all subspaces are integrated, the boundary effect is solved using a quartic linear interpolation method to obtain the SHOT descriptor of the point, totaling 32 × 11 = 352 dimensions.
Then, the SAC-IA algorithm with SHOT feature-geometric relationship dual constraints facilitates coarse registration and aids in determining the initial pose transformation matrix K between the key points of near-field 3D SAR and LiDAR.
In the correspondence searching stage, we select s = 3 sample points in key points of SAR point cloud P, where the distance between points is greater than the distance threshold so as to ensure that the SHOT features of selected sample points are different.For each sample point, the nearest neighbor search is used to find three key points in point cloud Q with the smallest difference in SHOT features, and a random point in the three points is selected as the corresponding point.Assuming the set of corresponding key points obtained is {(p ′ a , q ′ a )|p ′ a ∈ P, q ′ a ∈ Q, a = 1, 2, 3}, we calculate the edge length as follows: where (a, b) = {(1, 2), (2, 3), (3, 1)}.A triangle judgment is performed on the calculated edge length of the corresponding points.A congruence judgment is performed on the two obtained triangles.If the triangle condition and the congruence of the triangles are not met, we reselect the sample points.In addition, due to the different resolutions of multimodal point clouds, a threshold τ should be set to adjust the edge congruence judgment conditions, as follows: In the transformation estimation stage, the obtained s = 3 correspondences are used to solve the rigid transformation matrix between point cloud P and Q through SVD decomposition, and the Huber penalty function is used to calculate the distance error sum ∑ s a=1 H(e a ) after the rigid transformation, as follows: where t d is the set distance error threshold, e a is the distance error of the a th corresponding point after transformation, ∥•∥ represents the L1 norm, and H(e a ) is the distance error after imposing Huber penalty on e a .During the iteration of the above two operations, if the current distance error sum is the smallest, the transformation matrix will be retained, and the initial pose transformation matrix of the input point cloud can be obtained until the iteration ends, where R ′ is the spatial rotation matrix and t ′ is the translation vector.

ICP Fine Registration with Adaptive Threshold
After the coarse registration of key points, the initial pose transformation matrix K is obtained.Then, the improved ICP algorithm is proposed to accurately align the original point clouds of near-field SAR and LiDAR, and the precise pose transformation matrix K ′ is obtained.
It should be noted that there is a disparity in resolution between the two types of point clouds.LiDAR point clouds typically exhibit higher intensity levels and closer proximity between adjacent points compared to SAR point clouds.The original ICP algorithm only needs to select the corresponding points with the minimum distance, which is less than the given distance threshold during the correspondence searching.Therefore, the distance threshold for the judgment is fixed.During the iterative optimization process, the ICP algorithm approaches the optimal solution, while the distance between searched corresponding points also decreases.A fixed distance threshold will introduce more corresponding points during the later iteration, resulting in an increase in registration time.
To maintain accuracy while simultaneously improving registration efficiency, the adaptive threshold is adopted.The method replaces the fixed distance threshold with an adaptive one, which increases continuously to maintain accuracy and improve registration efficiency as the iterative optimization process progresses.Initially, a smaller threshold is used to capture fine-grained correspondences and refine the alignment.As the optimization progresses and the point clouds become closer to alignment, the threshold increases, allowing for faster convergence while still ensuring accurate registration.
The steps of the improved ICP algorithm are as follows.In the parameter initialization stage, we obtain the transformed point cloud P (0) after coarse registration by P (0) = R ′ P + t ′ , and set the initial distance threshold d (0) t , the overall distance error threshold ε and the maximum number of iterations M iter .In the stage of correspondence searching of the i th iteration, for ∀p j ∈ P (i−1) , we find the point q j ∈ Q closest to p j through the nearest search, where p j and q j only denote the geometric coordinates of the point.If p j − q j 2 ≤ d (i) t , p j and q j form the correspondence pairs, where d (i) t is the distance threshold of i th iteration.The final corresponding point set p j , q j p j ∈ P, q j ∈ Q, j = 1, • • • , J is obtained.In the transformation estimation stage of the i th iteration, we calculate the centroids of two point clouds in the corresponding point set using the following equations, denoted as µ p and µ q .
We construct the covariance matrix E J = 1 J ∑ J j=1 p j − µ p q j − µ q T and perform SVD decomposition on the covariance matrix using Equation ( 21).
where U and V are orthogonal matrices of 3 × 3, and Σ is a diagonal matrix composed of the eigenvalues of the covariance matrix E J .Further, the rotation matrix R (i) and the translation vector t (i) can be obtained as follows: The transformed point cloud P (i) can be obtained by P (i) = R (i) P (i−1) + t (i) .The distance error function F is calculated using Equation (24).
If F ≤ ε or has reached the maximum number of iterations, we stop the iteration.Otherwise, we update the distance threshold: where ρ is a constant.We use the new point cloud P (i) to return to the correspondence searching stage and continue with the next iteration.After the iteration ends, the final spatial rotation matrix R SL and translation vector t SL of the near-field SAR and LiDAR point cloud are obtained by , where M end is the number of iterations terminated.The final spatial transformation matrix is as follows:

Camera-LiDAR Point Cloud Registration
Compared with near-field SAR and LiDAR point cloud registration, LiDAR and optical color point cloud registration have similar processes.However, optical point clouds contain target size distortion that affects the registration performance, and do not have the intensity information required by the proposed coarse registration algorithm with geometric and intensity dual constraints.Therefore, some special treatments have been developed for optical point clouds, namely, format conversion and size correction.First, we convert the color of the optical color point cloud into intensity via the following equation [44]: where r, g, b represent the red component, green component, and blue component of optical color, respectively.After obtaining the optical intensity point cloud, the approach uses LiDAR point clouds that can truly reflect the target size as the size correction benchmark to address the target size distortion.The principal components analysis (PCA) [45] was used to correct the size of the optical point cloud.The calibration steps are as follows.
Assuming the LiDAR point cloud is , we calculate the centroids of point clouds L and O as follows: We calculate covariance matrix as follows: 3 are eigenvectors of covariance matrices E L and E O .Then, cross product orthogonalization is performed on a linearly independent basis to obtain the orthogonal basis and form feature space.
We calculate the rotation matrix R L , R O and translation vector t L , t O for converting point cloud s L and O from the original coordinate system to the feature space coordinate system, as follows: The point clouds L f and O f are converted to the feature space coordinate systems through Equation (36).We assume r L , r O are the coordinate ranges of point clouds L f and O f in the orthogonal basis direction corresponding to the maximum eigenvalue, and the scaling factor of the optical point clouds is calculated by After the format conversion and size correction of the optical point cloud, similar algorithms to those for key point extraction, coarse registration, and fine registration in Section 3.2 are used to obtain the spatial transformation matrix between the LiDAR and camera point cloud, as follows: where R CL and t CL represent the rotation matrix and translation vector converted from camera point cloud to LiDAR point cloud.

SAR-Camera-LiDAR Data Fusion
After point cloud registration, the following steps are used for multimodal data fusion.First, the LiDAR coordinate system serves as the reference coordinate system to align the multi-sensor point cloud coordinate system.By registering near-field SAR point clouds with LiDAR point clouds, the spatial transformation matrix K ′ SL is obtained.By registering optical color point clouds with LiDAR point clouds, the spatial transformation matrix K ′ CL is obtained.Assuming the near-field SAR point cloud of the target is P S , the LiDAR point cloud is P L , and the optical color point cloud is P C , then multimodal point cloud coordinate alignment can be achieved through Equation (38).
Then, the process mixes optical point clouds and SAR point clouds with color objects to ensure that the fused point clouds have both scattering and color information.The near-field SAR point cloud is colored based on scattering intensity to reflect the scattering information of objects.Each point in the optical point cloud is traversed and the closest point in the near-field SAR point cloud is found by the nearest neighbor search algorithm.If the distance between two points is less than the set threshold, we replace the color of the optical points with the color of the near-field SAR points.Otherwise, the optical color will still be used.
Finally, the redundancy in SAR point clouds is removed by deleting outliers relative to optical and LiDAR point clouds, and the multimodal fusion results with near-field SAR scattering intensity, precise geometric shape size, and color texture information are obtained.

Experimental Results
This section will describe the multi-sensor prototype experimental hardware system from which measured data were collected, and discuss the results obtained with the multimodal data fusion framework mentioned in the previous section.This section is organized into five parts.Section 4.1 presents and evaluates SAR-LiDAR registration results for the proposed point cloud registration algorithm, and Section 4.2 presents camera-LiDAR registration results.Section 4.3 demonstrates the SAR-camera-LiDAR multimodal data fusion results based on the proposed framework.Finally, Section 4.4 discusses the applications of the current work and shows relevant experimental results.The computer system used to test method has an Intel i7-10700 CPU with RTX 2070s graphics card and 64 GB of RAM memory.

SAR-LiDAR Registration Results
Manual corresponding point selection is used to obtain the rotation matrix R g and translation vector t g as registration truth values.Table 1 presents the quantitative evaluation results of the proposed improved registration method and the original registration method.The evaluation indicators are registration error and registration time, whereby registration error includes rotation error E R and translation error E t [46].
Table 1.Comparison of registration error and time of point cloud registration algorithms before and after improvement.Here, t g represents the true value of the registration translation vector, t e represents the estimated value of the registration method translation vector, R g represents the true value of the registration rotation matrix, R e represents the estimated value of the registration method rotation matrix, and tr(•) is the matrix trace operation.
By comparing the performance of the proposed improved method with that of the original two-step registration method (SAC-IA+ICP) in Table 1, it can be concluded that the improved method outperforms the original method in terms of registration error and efficiency.The original method can achieve rough alignment results in the registration of pincer and satellite models, but the rotation error is significantly large in the registration of two aircraft models, while the improved method can achieve accurate alignment on all targets.On the one hand, since the improved method extracts key points that better reflect the targets' structural characteristics compared to general points, it can provide more accurate corresponding point pairs in the correspondence searching stage of registration.On the other hand, the FPFH descriptor used in the original method is less effective in describing SAR point clouds than the SHOT descriptor, and the geometric relationship constraints also minimize the impact of rotation angle ambiguity.
In addition, we quantitatively compare the performance of the proposed method with current mainstream registration methods, including Super4PCS [47], ICP, NDT [48], and CPD [49].Table 2 shows the registration errors of these methods, demonstrating that the proposed method achieves the lowest rotation angle errors in all experiments, with competitive performance in translation errors.The Super4PCS algorithm fails in registering the pincer, possibly due to the thin thickness of the pincer, which is similar to planar target.This resulted in the Super4PCS algorithm extracting a large set of error points.The ICP algorithm's registration requires a good initial posture.In cases of poor initial posture, it is easy to fail registration and obtain local optimal solutions.When registering the two aircraft models and the pincer, there is a large angle rotation deviation compared to other methods.Both NDT and CPD are registration methods based on hypothetical probability distribution models.When the shape difference between the two point clouds is significant, incorrect matching may occur due to the difference point distributions.It can be seen that LiDAR can accurately describe the geometric shapes of all targets, while there are different levels of shape missing in SAR point clouds.Therefore, relying on the precise positioning of LiDAR and the scattering information of SAR, the four targets can be accurately aligned in position, and the scattering characteristics of each target can also be clearly located in the registration results of the near-field SAR point cloud and LiDAR point cloud.However, LiDAR point clouds lack color and texture information, and so it is still necessary to fuse optical photos to assist the target category judgment.It can be seen that LiDAR can accurately describe the geometric shapes of all targets, while there are different levels of shape missing in SAR point clouds.Therefore, relying on the precise positioning of LiDAR and the scattering information of SAR, the four targets can be accurately aligned in position, and the scattering characteristics of each target can also be clearly located in the registration results of the near-field SAR point cloud and LiDAR point cloud.However, LiDAR point clouds lack color and texture information, and so it is still necessary to fuse optical photos to assist the target category judgment.

Camera-LiDAR Registration Results
After the format conversion and size correction of the optical point cloud, the proposed method is used to register the LiDAR point cloud with the optical point cloud.Figure 8 shows the registration results of each target's LiDAR point cloud and optical point cloud, verifying that the proposed method is also applicable in the registration of LiDAR and the optical color point cloud.LiDAR point clouds are colored in white, while optical point clouds are colored in their true colors.

Camera-LiDAR Registration Results
After the format conversion and size correction of the optical point cloud, the proposed method is used to register the LiDAR point cloud with the optical point cloud.Figure 8 shows the registration results of each target's LiDAR point cloud and optical point cloud, verifying that the proposed method is also applicable in the registration of LiDAR and the optical color point cloud.LiDAR point clouds are colored in white, while optical point clouds are colored in their true colors.

SAR-Camera-LiDAR Data Fusion Results
The proposed multimodal fusion framework utilizes pairwise registration results to unify the near-field 3D SAR image and optical color point cloud into the LiDAR coordinate system, and then obtains the fusion results by attaching color and scattering intensity to the aligned point cloud.As shown in Figure 9, the near-field SAR image of each target and the corresponding SAR-camera-LiDAR data fusion results are presented.
By integrating the precise geometric sizes of LiDAR point clouds and the color information of optical color point clouds, scattering characteristics can be accurately located and target categories can be intuitively determined.Multi-sensor data fusion not only re-

SAR-Camera-LiDAR Data Fusion Results
The proposed multimodal fusion framework utilizes pairwise registration results to unify the near-field 3D SAR image and optical color point cloud into the LiDAR coordinate system, and then obtains the fusion results by attaching color and scattering intensity to the aligned point cloud.As shown in Figure 9, the near-field SAR image of each target and the corresponding SAR-camera-LiDAR data fusion results are presented.mation of optical color point clouds, scattering characteristics can be accurately located and target categories can be intuitively determined.Multi-sensor data fusion not only reduces the difficulty of SAR scattering characteristics diagnosis, but it also improves the efficiency of SAR image interpretation.In the fusion image, it can be clearly seen that the scattering at the head of the aircraft model 1 is weak, while the scattering on both wings is strong.The scattering in the middle part of the passenger plane model is strong, while the scattering in the head and tail is weak.The pincer has high scattering characteristics in all parts and the SAR imaging contour is clear.The two wings of the satellite model have strong scattering, and there is also scattering at the vertical connecting rod.However, from the satellite model fusion image, it can also be seen that some of the satellite SAR scattering characteristics are lost due to the holes in the optical color point cloud generated during SFM reconstruction.Therefore, high-quality optical color point clouds need to be obtained in the future.

Multimodal Fusion Application Experiment
In order to demonstrate the advantages of multimodal data fusion in near-field SAR applications, application experiments have been conducted for concealed target detection and fault detection.
Figure 10 shows the experimental results of concealed target detection.Aircraft model 1 and aircraft model 2 hidden in a cardboard box are placed in the experimental scene.The millimeter wave near-field array 3D SAR imaging system can penetrate the cover (cardboard box) to image hidden targets (passenger plane model).By integrating the precise geometric sizes of LiDAR point clouds and the color information of optical color point clouds, scattering characteristics can be accurately located and target categories can be intuitively determined.Multi-sensor data fusion not only reduces the difficulty of SAR scattering characteristics diagnosis, but it also improves the efficiency of SAR image interpretation.In the fusion image, it can be clearly seen that the scattering at the head of the aircraft model 1 is weak, while the scattering on both wings is strong.The scattering in the middle part of the passenger plane model is strong, while the scattering in the head and tail is weak.The pincer has high scattering characteristics in all parts and the SAR imaging contour is clear.The two wings of the satellite model have strong scattering, and there is also scattering at the vertical connecting rod.However, from the satellite model fusion image, it can also be seen that some of the satellite SAR scattering characteristics are lost due to the holes in the optical color point cloud generated during SFM reconstruction.Therefore, high-quality optical color point clouds need to be obtained in the future.

Multimodal Fusion Application Experiment
In order to demonstrate the advantages of multimodal data fusion in near-field SAR applications, application experiments have been conducted for concealed target detection and fault detection.
Figure 10 shows the experimental results of concealed target detection.Aircraft model 1 and aircraft model 2 hidden in a cardboard box are placed in the experimental scene.The millimeter wave near-field array 3D SAR imaging system can penetrate the cover (cardboard box) to image hidden targets (passenger plane model).
applications, application experiments have been conducted for concealed target detection and fault detection.
Figure 10 shows the experimental results of concealed target detection.Aircraft model 1 and aircraft model 2 hidden in a cardboard box are placed in the experimental scene.The millimeter wave near-field array 3D SAR imaging system can penetrate the cover (cardboard box) to image hidden targets (passenger plane model).The results in Figure 10 show that, generally, noise often appears in 3D SAR images, which can mistakenly be considered as a target.However, by fusing the results from Li-DAR, the proposed method can accurately locate the true position of the target and avoid including incorrect positions.Traditionally, this judgement relies on manual expert experience.The results also show that even a sheltered target can be detected.The LiDAR can observe the shelter wherein hidden targets are located, but cannot observe the hidden targets.The SAR-LiDAR fusion result can clearly depict the positions of hidden targets relative to the cover, which is beneficial for the application of concealed target detection.
Figure 11 shows the results of the fault detection experiment.Fault detection usually refers to the scattering enhancement of the target fault area, which is detected by the nearfield 3D SAR imaging system [50].It mainly finds possible fault areas by comparing the imaging results of the non-fault target and the fault target.The scattering from the head The results in Figure 10 show that, generally, noise often appears in 3D SAR images, which can mistakenly be considered as a target.However, by fusing the results from LiDAR, the proposed method can accurately locate the true position of the target and avoid including incorrect positions.Traditionally, this judgement relies on manual expert experience.The results also show that even a sheltered target can be detected.The LiDAR can observe the shelter wherein hidden targets are located, but cannot observe the hidden targets.The SAR-LiDAR fusion result can clearly depict the positions of hidden targets relative to the cover, which is beneficial for the application of concealed target detection.
Figure 11 shows the results of the fault detection experiment.Fault detection usually refers to the scattering enhancement of the target fault area, which is detected by the nearfield 3D SAR imaging system [50].It mainly finds possible fault areas by comparing the imaging results of the non-fault target and the fault target.The scattering from the head of aircraft model 1 is weak, indicating good stealth performance.Therefore, as shown in Figure 11a, a rivet is placed on the head of aircraft model 1 as the fault target.The scattering intensity of the rivet is relatively high, simulating targets with degraded stealth performance (with faults).In the results of radar in Figure 11b, a challenge due to the unique scattering characteristics of the target exists.Specifically, the head and nose parts of the aircraft model are missed when comparing the results from other sensors.This presents a difficulty related to identifying the part of the aircraft with a stealth performance fault.This identification is critical in determining the severity of the fault and planning appropriate solutions.However, a combination of other data helps to overcome this hurdle.Precise information on the physical structure of an aircraft can be obtained using shape and position data provided by LiDAR.This is further supplemented by the colorful texture information gathered from the camera, which provides a more detailed and visually rich representation of the aircraft's surface.This integrated approach makes the identification and location of faults considerably more straightforward and accurate, enhancing the overall effectiveness of our inspection and maintenance process.As shown in Figure 11c, the fault location can be located precisely and more intuitively by comparing the SAR-LiDAR fusion result.And Figure 11d shows the fault can be identified more intuitively by the LiDAR-camera fusion result.
ture information gathered from the camera, which provides a more detailed and visually rich representation of the aircraft's surface.This integrated approach makes the identification and location of faults considerably more straightforward and accurate, enhancing the overall effectiveness of our inspection and maintenance process.As shown in Figure 11c, the fault location can be located precisely and more intuitively by comparing the SAR-LiDAR fusion result.And Figure 11d shows the fault can be identified more intuitively by the LiDAR-camera fusion result.

Conclusions
This work employs multimodal data fusion for the first time to enhance the perception ability of near-field 3D SAR, which leverages the complementary strengths of multiple sensors (LiDAR's precise object localization and camera's color information).To address the difficulty in aligning the coordinate system related to data formats, noise, and resolution differences during the data fusion of SAR-camera-LiDAR, a three-step coarseto-fine point cloud registration method is designed for our multimodal fusion framework.This method begins with a CED key point extraction algorithm with structure-intensity dual constraints, proposed to extract key points for subsequent registration.Next, the coarse registration step integrates SHOT feature-geometric relationship dual constraints into the SAC-IA algorithm to generate a rough space transformation matrix to provide a better initial pose.The subsequent fine registration leverages an ICP fine registration

Conclusions
This work employs multimodal data fusion for the first time to enhance the perception ability of near-field 3D SAR, which leverages the complementary strengths of multiple sensors (LiDAR's precise object localization and camera's color information).To address the difficulty in aligning the coordinate system related to data formats, noise, and resolution differences during the data fusion of SAR-camera-LiDAR, a three-step coarse-to-fine point cloud registration method is designed for our multimodal fusion framework.This method begins with a CED key point extraction algorithm with structure-intensity dual constraints, proposed to extract key points for subsequent registration.Next, the coarse registration step integrates SHOT feature-geometric relationship dual constraints into the SAC-IA algorithm to generate a rough space transformation matrix to provide a better initial pose.The subsequent fine registration leverages an ICP fine registration algorithm with adaptive thresholds, achieving the precise alignment of multi-sensor point clouds through an accurate space transformation matrix.The experimental results demonstrate that the proposed method achieves a state-of-the-art registration result in both quantitative and qualitative measurements, showing promising potential for advanced applications such as RCS measurement and concealed object detection in near-field 3D SAR scenarios.
Regarding the limitations of our current work, it is noted that near-field 3D SAR and LiDAR point clouds can be obtained via single perspective measurements.This restricts their ability to comprehensively perceive and interpret scenes.Hence, future work will explore reconstruction of multi-view near-field 3D SAR point clouds and corresponding multi-sensor data fusion methods to improve modeling and perception.Moreover, current learning-based methods have demonstrated impressive performance in handling point cloud data, such as feature description and matching [51].The next phase of our fusion framework will adopt these learning-based processing methods, replacing the existing ones.
exhibits the scene of LiDAR point cloud acquisition, the original LiDAR point clouds, and the results obtained through the LiDAR preprocessing process detailed in Section 3.1.2.

Figure 1 .
Figure 1.Near-field array 3D SAR data acquisition, imaging results and preprocessing results.(a) Results for aircraft model 1; (b) results for aircraft model 2; (c) results for pincer; (d) results for satellite model.

Figure 1 .Figure 1 .
Figure 1.Near-field array 3D SAR data acquisition, imaging results and preprocessing results.(a) Results for aircraft model 1; (b) results for aircraft model 2; (c) results for pincer; (d) results for satellite model.

Figure 2 .
Figure 2. LiDAR data acquisition, imaging results and preprocessing results.(a) Results for aircraft model 1; (b) results for aircraft model 2; (c) results for pincer; (d) results for satellite model.

Figure 2 .Figure 3 .
Figure 2. LiDAR data acquisition, imaging results and preprocessing results.(a) Results for aircraft model 1; (b) results for aircraft model 2; (c) results for pincer; (d) results for satellite model.Remote Sens. 2024, 16, x FOR PEER REVIEW 7 of 25

Figure 3 .
Figure 3. Camera data acquisition, imaging results and preprocessing results.(a) Results for aircraft model 1; (b) results for aircraft model 2; (c) results for pincer; (d) results for satellite model.

Figure 4 .
Figure 4.The experiment scene of the near-field array 3D SAR imaging system.
exhibits the scene of LiDAR point cloud acquisition, the original LiDAR point clouds, and the results obtained through the LiDAR preprocessing process detailed in Section 3.1.2.
Figure 4 depicts multi-view 2D optical image acquisition, the original 3D reconstruction results, and the results of the optical point cloud preprocessing process detailed in Section 3.1.3.

Figure 4 .
Figure 4.The experiment scene of the near-field array 3D SAR imaging system.

25 Figure 5 .Figure 5 .
Figure 5.The overall flowchart of the proposed near-field SAR multimodal fusion framework.3.1.Data Preprocessing 3.1.1.Near-Field SAR Preprocessing The preprocessing of the near-field 3D SAR image to extract targets is shown in Figure 6a.The near-field 3D SAR image is generated using the BP algorithm.It is then con-

Figure 6 .
Figure 6.The data preprocessing pipeline used in our proposed multimodal fusion framework.(a) Specific near-field SAR data preprocessing operations; (b) specific LiDAR data preprocessing operations; (c) specific camera data preprocessing operations.

3. 1 Figure 6 .
Figure 6.The data preprocessing pipeline used in our proposed multimodal fusion framework.(a) Specific near-field SAR data preprocessing operations; (b) specific LiDAR data preprocessing operations; (c) specific camera data preprocessing operations.

Figure 7
Figure 7 shows the registration results of the proposed method for near-field SAR point clouds and LiDAR point clouds.The LiDAR point cloud is displayed in white, and the near-field array 3D SAR point cloud is colored based on scattering intensity.The spatial positions of near-field SAR point clouds and LiDAR point clouds before registration are marked with ellipses, and the details of near-field SAR point clouds and LiDAR point clouds are displayed in a white box in the middle of the image.It can be seen that LiDAR can accurately describe the geometric shapes of all targets, while there are different levels of shape missing in SAR point clouds.Therefore, relying on the precise positioning of LiDAR and the scattering information of SAR, the four targets can be accurately aligned in position, and the scattering characteristics of each target can also be clearly located in the registration results of the near-field SAR point cloud and LiDAR point cloud.However, LiDAR point clouds lack color and texture information, and so it is still necessary to fuse optical photos to assist the target category judgment.

Figure 7 Figure 7 .
Figure 7 shows the registration results of the proposed method for near-field SAR point clouds and LiDAR point clouds.The LiDAR point cloud is displayed in white, and the near-field array 3D SAR point cloud is colored based on scattering intensity.The spatial positions of near-field SAR point clouds and LiDAR point clouds before registration are marked with ellipses, and the details of near-field SAR point clouds and LiDAR point clouds are displayed in a white box in the middle of the image.

Figure 7 .
Figure 7. Near-field SAR point clouds and LiDAR point clouds before and after registration.(a) Aircraft model 1 before and after registration; (b) aircraft model 2 before and after registration; (c) pincer before and after registration; (d) satellite model before and after registration.

Figure 8 .
Figure 8. Optical point clouds and LiDAR point clouds before and after registration.(a) Aircraft model 1 before and after registration; (b) aircraft model 2 before and after registration; (c) pincer before and after registration; (d) satellite model before and after registration.

Figure 8 .
Figure 8. Optical point clouds and LiDAR point clouds before and after registration.(a) Aircraft model 1 before and after registration; (b) aircraft model 2 before and after registration; (c) pincer before and after registration; (d) satellite model before and after registration.

Figure 9 .
Figure 9. Near-field 3D SAR images and corresponding multimodal fusion results.(a) Aircraft model 1 before and after multimodal fusion; (b) aircraft model 2 before and after multimodal fusion; (c) pincer before and after multimodal fusion; (d) satellite model before and after multimodal fusion.

Figure 10 .
Figure 10.Application experiment of concealed target detection.(a) Near-field SAR image, LiDAR point cloud, and optical image of the experiment scene; (b) front view, left view, and top view of the fusion image of near-field SAR and LiDAR.

Figure 9 .
Figure 9. Near-field 3D SAR images and corresponding multimodal fusion results.(a) Aircraft model 1 before and after multimodal fusion; (b) aircraft model 2 before and after multimodal fusion; (c) pincer before and after multimodal fusion; (d) satellite model before and after multimodal fusion.

Figure 10 .
Figure 10.Application experiment of concealed target detection.(a) Near-field SAR image, LiDAR point cloud, and optical image of the experiment scene; (b) front view, left view, and top view of the fusion image of near-field SAR and LiDAR.

Figure 10 .
Figure 10.Application experiment of concealed target detection.(a) Near-field SAR image, LiDAR point cloud, and optical image of the experiment scene; (b) front view, left view, and top view of the fusion image of near-field SAR and LiDAR.

Figure 11 .
Figure 11.Application experiment of fault detection.(a) Optical image layout for fault detection experiment (left-without fault, right-with fault); (b) near-field 3D SAR imaging results (leftwithout fault, right-with fault); (c) near-field 3D SAR-LiDAR fusion results (left-without fault, right-with fault); (d) LiDAR-camera fusion result and multimodal fusion result without fault.The white circles in the figure indicate where the faults are set in the experiment.

Figure 11 .
Figure 11.Application experiment of fault detection.(a) Optical image layout for fault detection experiment (left-without fault, right-with fault); (b) near-field 3D SAR imaging results (leftwithout fault, right-with fault); (c) near-field 3D SAR-LiDAR fusion results (left-without fault, right-with fault); (d) LiDAR-camera fusion result and multimodal fusion result without fault.The white circles in the figure indicate where the faults are set in the experiment.

Table 2 .
Comparison of registration errors between the proposed method and other point cloud registration methods.