Image Registration and Fusion of Visible and Infrared Integrated Camera for Medium-Altitude Unmanned Aerial Vehicle Remote Sensing

: This study proposes a novel method for image registration and fusion via commonly used visible light and infrared integrated cameras mounted on medium-altitude unmanned aerial vehicles (UAVs).The innovation of image registration lies in three aspects. First, it reveals how complex perspective transformation can be converted to simple scale transformation and translation transformation between two sensor images under long-distance and parallel imaging conditions. Second, with the introduction of metadata, a scale calculation algorithm is designed according to spatial geometry

Medium-altitude unmanned aerial vehicles (UAVs) are an important information acquisition platform in the integrated Earth observation network [1].UAVs offer the advantages of flexibility and rapid response.Compared with manned aerial vehicles, medium-altitude UAVs can work in high-risk areas to accomplish detection missions.They are also capable of flying long distances and feature a wide detection range and an operation time that lasts longer than that of low-altitude UAVs.Medium-altitude UAVs play an irreplaceable role in normal observation, disaster monitoring, and battlefield detection applications.
Visible light cameras and infrared cameras are the most commonly used imaging devices in medium-altitude UAVs.Visible light imaging offers the advantages of intuitive impression, rich information, and high resolution, but it is susceptible to low-visibility atmospheric conditions.By contrast, infrared imaging is not significantly affected by atmospheric conditions, and it can identify hidden or disguised heat source targets.Given the complementarity of these two types of cameras, most UAVs are equipped with visible light and infrared integrated cameras.

Utility of Visible and Infrared Image Fusion
With the development of imaging sensors, image fusion has become a hot research topic in image processing, pattern recognition, and computer vision.Image fusion combines different sets of information from two or more images of a given scene acquired at different situations with one or multiple sensors [2].In the past decade, visible and infrared image fusion was widely used in both military and civil applications.In the military, visible and infrared image fusion plays an increasingly important role in UAV autonomous navigation [3], target detection [4], environment perception [5], and military information monitoring [6].In the civilian realm, many applications, including national environmental protection [7], agricultural remote sensing [8], wildlife multispecies remote sensing [9], safety surveillance [10], and saliency detection [11,12], significantly benefited from information enhancement after visible and infrared image fusion.

Problems of Visible and Infrared Image Registration and Fusion for UAV Applications
Registration and fusion are two of the most crucial technologies in the applications of image fusion mentioned.
Image registration [13] is the process of matching two or more images obtained at different times by different sensors (imaging equipment) or under different conditions (weather, illumination, position, and perspective); this technology has been widely used in computer vision, pattern recognition, medical image analysis, and remote sensing image analysis.Compared with homologous image registration, the registration of visible and infrared images involves certain difficulty and particularity.First, the remote sensing images of the same area obtained by different sensors show different resolutions, pixel values, spectral phases, and scene characteristics because of different imaging mechanisms.Second, the particularity of medium-altitude UAV imaging brings some adverse effects to image registration.Visible images may be degraded under long-distance imaging conditions because of atmospheric effects, which could reduce the number of extracted image features.Large motion between image frames could increase the time consumption of image search.
The purpose of image fusion is to process multi-source redundant data in space and time according to certain algorithms, obtain more accurate and more abundant information than any single dataset, and generate combination images with new space, spectrum, and time characteristics.Image fusion is not only a simple combination of data, but it also emphasizes the optimization of information to highlight useful and interesting information and eliminate or suppress irrelevant information.Despite the availability of many image fusion algorithms, improving the resulting image resolution and enhancing the saliency of interesting areas in images remain problematic.

Image Registration
Popular registration methods usually depend on image information.These methods can be divided into the following two categories according to various similarity measures: intensity-based methods and feature-based methods.Intensity-based methods include gray information-based methods and transform domain-based methods.
Gray information-based methods measure similarity using the gray statistical information of an image itself.These algorithms are convenient to implement, but the application scope is narrow, and the computation is significantly large.The correlation method can match input images with similar scale and gray information based on gray information [14,15].A novel and robust statistic as a similarity measure for robust image registration was proposed in [14].The statistic is called the increment sign correlation because it is based on the average evaluation of the incremental tendency of brightness in adjacent pixels.Tsin and Kanade [15] extended the correlation technique to point set registration using a method called kernel correlation.Another classical registration algorithm is based on mutual information.Mutual information is obtained by calculating the entropy of two variables and their joint entropy, which can be used in image registration.On the basis of traditional mutual information registration, Zhuang et al. [16] proposed a novel hybrid algorithm that combines the particle swarm optimization (PSO) algorithm and Powell search method to obtain improved performance in terms of time and precision.In [17], a novel infrared and visual image registration method based on phase grouping and mutual information of gradient orientation was presented.The visible and infrared registration method proposed in [18] combines a bilateral filter and cross-cumulative residual entropy.
Image registration methods based on the transform domain mostly use Fourier transform.They are limited by the invariance of the Fourier transform, which is only suitable for the images of corresponding definitions (such as rotation, translation, etc.) in Fourier transform.Pohitand Sharma [19] developed an algorithm based on Fourier slice theorem to measure the simultaneous rotation and translation of an object in a 2D plane.Niu H. et al. [20] proposed a novel method based on the combination of fractional Fourier transform (FRFT) and a conventional phase correlation technique.Compared with conventional fast Fourier transform-based methods, the proposed method employs called FRFT contains both spatial and frequency information.Li, Zhang, and Hu [21] proposed a registration scheme for multispectral systems using phase correlation and scale invariant feature matching.This scheme uses phase correlation method to calculate the parameters of a coarse-offset relationship between different band images and then detects the scale invariant feature transform (SIFT) points for image matching.In addition to the Fourier transform, a uniform space was used in a new registration method for non-rigid images proposed in [22].The key point is normalized mapping, which transforms any image into an intermediate space.Under a uniform space, the anatomical feature points of different images are matched via rotation and scaling.
Feature-based methods are the most common category in image registration.These methods depend on image points [23][24][25][26], line segments [27,28], regions [29], and other features [30], and they show a wide range of applications.SIFT [23,24] is one of the most widely used features with satisfactory performance.Based on SIFT, several studies [25] conducted improved, extended, and in-depth research on visible and infrared image registration.An image registration method based on speeded up robust features was proposed in view of the slow speed of SIFT [26].In [27], a new general registration method for images of varying nature was presented.Edge images are processed to extract straight linear segments, which are then grouped to form triangles.To solve the feature matching problem, wherein the interest points extracted from both images are not always identical, Han et al. [28] emphasized the geometric structure alignment of features (lines) instead of focusing on descriptor-based individual feature matching.In [29], Liu et al. proposed an edge-enhanced, maximally stable extremal region method in multi-spectral image registration.An image registration method based on visually salient (VS) features was introduced [30].A VS feature detector based on a modified visual attention model was presented to extract VS points.This detector combines the information of infrared images and its negative image to overcome the contrast reverse problem between visible and infrared images, thereby facilitating the search for corresponding points on visible/infrared images.
Other new methods emerged in addition to these three methods, and they include diffusion map-based method [31], alignment metric-based method [32], hybrid image feature-based method [33], nonsubsampled contourlet transform (NSCT) and gradient mirroring-based method [34], and the random projection and sparse representation-based method [35].Some of these studies achieved good results in visible and infrared image registration and they provide new ideas to solve the problem of multimodal image registration.
These studies achieved great successes in the area of image registration.However, most of them are based only on image information and attempt to establish correspondence between visible and infrared images, thereby establishing matching transformation between the two images.In fact, they explore two vital issues of homonymy feature detection and feature matching.Given the different spectra and imaging mechanisms, homonymy feature detection is a difficult problem for multimodal images.From the aerial perspective, the transformation between two sets of image features is required to meet perspective invariance, which increases the difficulty of image feature matching.
For UAV applications, image registration methods still depend on image information despite the rapid development of visible and infrared sensors.Rich metadata from imaging sensors and other equipment of UAV systems are insufficiently exploited.

Image Fusion
Image fusion can be conducted at three different levels, namely, the pixel layer, feature layer, and decision level [36].This study mainly explores pixel layer-based fusion methods.
Image fusion methods based on pixel levels are traditionally divided into spatial domain methods and transform domain methods.Spatial domain-based methods operate directly on the gray values of images; they mainly include the gray weighted method, principal component analysis (PCA) method [37], color mapping method [38], contrast or gray adjustment method, Markov random field method [39], Bayesian optimization method [40], double modal neural network method [41], and pulse coupled neural network (PCNN) method [42].In the transform domain fusion, the images should be transformed into the transform domain space before the fusion of the coefficients is conducted.This type of methods mainly include the Laplace pyramid transform-based method [43], wavelet transform-based method [44], ridgelet transform-based method [45], contourlet transform-based method [46], NSCT-based method [47], compressed sensing-based method [48], and sparse representation-based method [49].
In recent years, several scholars introduced effective methods for multi-modality image fusion.Zhang et al. [50] proposed a systematic review of sparse representation-based multi-sensor image fusion literature, which highlighted the pros and cons of each category of approaches.Han et al. [51] presented a saliency-aware fusion algorithm for integrating infrared and visible light images (or videos) to enhance the visualization of the latter.The algorithm involves saliency detection followed by biased fusion.The goal of saliency detection is to generate a saliency map for the infrared image to highlight the co-occurrence of high brightness values and motion.Markov random fields are used to combine these two sources of information.Liu et al. [52] introduced a novel method to fuse infrared and visible light images based on region segmentation.Region segmentation is used to determine important regions and background information in input images.
For UAV applications, visible light sensors can capture relatively abundant spectral information with clear texture and high spatial resolution, but in poor light conditions, image quality declines significantly.By contrast, infrared sensors can penetrate smoke and fog and perform effective detection under poor light conditions; however, the obtained image shows low contrast, fuzzy scene, and poor details.Based on the requirements of UAV applications, the fusion of visible and infrared images need to combine the two types of image feature data.This method can obtain a high spatial resolution of scene information and interesting target areas can be highlighted.

Present Work
This study aims to develop a method of visible and infrared image registration and fusion for medium-altitude UAV applications.The research scope is applicable to widely used visible light and infrared integrated cameras, which include two aspects of registration and fusion.
In image registration, our method attempts to solve the problem from the UAV system level instead of using image information alone.Three main problems are studied.The first problem is the transformation between two images under long distance aerial imaging with visible light and infrared integrated cameras.In addition to image information, the second problem is the use of the rich metadata of UAV systems to estimate the transformation between visible and infrared images.The third problem is the detection and matching of homonymy features in multimodal images to obtain precise image registration with the aid of metadata.
Based on image registration, image fusion for UAV applications should not only obtain high spatial resolution and extensive scene information and highlight interesting target areas.Thus, a new pixel layer-based image fusion method using PCNN and NSCT is examined in this study.

UAV System with a Visible Light and Infrared Integrated Camera
In this study, we employ a medium-altitude UAV, which is used in earthquake emergency and rescue to collect images of disaster areas effectively and accurately with the aid of imaging devices (Figure 1).The specific parameters are described in Table 1.
Remote Sens. 2017, 9, 441 5 of 29 The third problem is the detection and matching of homonymy features in multimodal images to obtain precise image registration with the aid of metadata.
Based on image registration, image fusion for UAV applications should not only obtain high spatial resolution and extensive scene information and highlight interesting target areas.Thus, a new pixel layer-based image fusion method using PCNN and NSCT is examined in this study.

UAV System with a Visible Light and Infrared Integrated Camera
In this study, we employ a medium-altitude UAV, which is used in earthquake emergency and rescue to collect images of disaster areas effectively and accurately with the aid of imaging devices (Figure 1).The specific parameters are described in Table 1.A visible light and infrared integrated camera platform is mounted on the front belly of the UAV, as shown in Figure 2. The two optical axes of the visible and infrared imaging sensors are parallel.The visible image resolution is 1392 × 1040, and the infrared image resolution is 640 × 512.The UAV features three degrees of freedom (DOF), and the imaging device features two DOF relative to the UAV body.Equipped with GPS (Global Position System), INS (Inertial Navigation System), and an altimeter, the UAV can measure position and orientation.
Remote Sens. 2017, 9, 441 6 of 29 A visible light and infrared integrated camera platform is mounted on the front belly of the UAV, as shown in Figure 2. The two optical axes of the visible and infrared imaging sensors are parallel.The visible image resolution is 1392 × 1040, and the infrared image resolution is 640 × 512.The UAV features three degrees of freedom (DOF), and the imaging device features two DOF relative to the UAV body.Equipped with GPS (Global Position System), INS (Inertial Navigation System), and an altimeter, the UAV can measure position and orientation.These types of visible light and infrared integrated cameras have been widely used for medium-altitude UAVs.Therefore, our research shows extensive application potential and practical value.

Long-Distance Integrated Parallel Vision
According to the visible light and infrared integrated camera of a medium-altitude UAV, this study attempts to reveal the principle of integrated parallel vision.Most medium-altitude UAV systems are mounted with visible light and infrared integrated cameras, which integrate two types of sensors, as shown in Figure 2. In the integrated structure, the optical axes of the visible sensor and infrared sensor are parallel to each other, and the imaging model can be approximated as a pinhole model [53] under the condition of long-distance imaging over thousands of meters.
Figure 3 shows that the image planes of the two sensors are parallel to each other and the two optical axes are also parallel.With camera rotation, the two sensors always point in the same direction and they have a common field of view (FOV), which is reflected as an overlapping area in the two images.In aerial images, this transformation between two image planes should be described using a perspective transformation.However, under long-distance imaging conditions, only scale transformation and translation transformation exist between the visible and infrared images obtained from the integrated camera at the same moment.These types of visible light and infrared integrated cameras have been widely used for mediumaltitude UAVs.Therefore, our research shows extensive application potential and practical value.

Long-Distance Integrated Parallel Vision
According to the visible light and infrared integrated camera of a medium-altitude UAV, this study attempts to reveal the principle of integrated parallel vision.Most medium-altitude UAV systems are mounted with visible light and infrared integrated cameras, which integrate two types of sensors, as shown in Figure 2. In the integrated structure, the optical axes of the visible sensor and infrared sensor are parallel to each other, and the imaging model can be approximated as a pinhole model [53] under the condition of long-distance imaging over thousands of meters.
Figure 3 shows that the image planes of the two sensors are parallel to each other and the two optical axes are also parallel.With camera rotation, the two sensors always point in the same direction and they have a common field of view (FOV), which is reflected as an overlapping area in the two images.In aerial images, this transformation between two image planes should be described using a perspective transformation.However, under long-distance imaging conditions, only scale transformation and translation transformation exist between the visible and infrared images obtained from the integrated camera at the same moment.The assumption is that the visible and infrared image planes are parallel to the ground, similar to the imaging relationship principle.Line  1) and ( 2) are obtained according to triangle similarity.
vg D and ig D are approximately equal under long-distance imaging conditions.g D could be introduced to represent the distance from the image plane to the ground in Equation (3).
Then, Equation (4) can be inferred as where k is a constant.This equation proves that the overlapping regions of i i c b and v v c b have the same direction and scale size.Hence, only translation transformation and scale transformation exist between the two image planes.According to the above analysis, a complex perspective transformation of image registration could be converted to scale and translation transformation under long-distance integrated parallel vision.This principle is applicable to all of the visible light and infrared integrated cameras of medium-altitude UAVs.This equation breaks the conventional problem of perspective transformation through a direct solution via image feature detection and matching, which is difficult in most cases and sometimes impossible due to the different imaging mechanisms of multimodal images.The assumption is that the visible and infrared image planes are parallel to the ground, similar to the imaging relationship principle.Line a g b g c g d g represents the FOV of the two sensors, and line b g c g is the common FOV.f v and f i are the focal lengths of the two sensors.O v and O i are the two foci.D a is the distance between two imaging axes.D vg and D ig denote the distances from the image plane to the ground.Based on the pinhole imaging principle, Equations ( 1) and ( 2) are obtained according to triangle similarity.
D vg and D ig are approximately equal under long-distance imaging conditions.D g could be introduced to represent the distance from the image plane to the ground in Equation (3).
Then, Equation ( 4) can be inferred as where k is a constant.This equation proves that the overlapping regions of c i b i and c v b v have the same direction and scale size.Hence, only translation transformation and scale transformation exist between the two image planes.
According to the above analysis, a complex perspective transformation of image registration could be converted to scale and translation transformation under long-distance integrated parallel vision.This principle is applicable to all of the visible light and infrared integrated cameras of medium-altitude UAVs.This equation breaks the conventional problem of perspective transformation through a direct solution via image feature detection and matching, which is difficult in most cases and sometimes impossible due to the different imaging mechanisms of multimodal images.

Visibleand Infrared Image Registration
According to the long-distance integrated parallel vision in Section 2.2.1,only scale transformationand translation transformation exist between the visible image and infrared image.The transformation from the infrared image to the visible image can be expressed as Equation ( 5) where I v denotes a visible image and I i denotes an infrared image.M is the transformation matrix from the infrared image to the visible image; it is composed of two parts, namely, the scale matrix M S and translation matrix M T , which are defined in Equations ( 6) and (7).
where s x , s y , t x , and t y are transformation parameters.The translation matrix M T is solved in two steps of Equation ( 8) to improve efficiency and accuracy.
where M Tc is the coarse registration matrix from the visible image to the infrared image based on metadata and M Tp is the precise registration matrix based on the image matching method.
Accordingly, the problem of visible and infrared image registration can be decomposed into scale calculation, coarse translation estimation, and precise translation estimation.The overall solution process is shown in Figure 4.

Visibleand Infrared Image Registration
According to the long-distance integrated parallel vision in Section 2.2.1,only scale transformationand translation transformation exist between the visible image and infrared image.The transformation from the infrared image to the visible image can be expressed as Equation ( 5) where v I denotes a visible image and i I denotes an infrared image.M is the transformation matrix from the infrared image to the visible image; it is composed of two parts, namely, the scale matrix S M and translation matrix T M , which are defined in Equations ( 6) and (7).
where x s , y s , x t , and y t are transformation parameters.The translation matrix T M is solved in two steps of Equation ( 8) to improve efficiency and accuracy.Accordingly, the problem of visible and infrared image registration can be decomposed into scale calculation, coarse translation estimation, and precise translation estimation.The overall solution process is shown in Figure 4.  Scale calculation is based on spatial geometry using pixel size and the focal length of two sensors.Translation calculation is divided into metadata-based coarse translation estimation and image-based precise translation estimation.In coarse translation estimation, the transformation from the image plane to the ground plane is established according to the theory of photogrammetry and coordinate transformation.We then attempt to detect the same name points of two images in the ground coordinate system through geographical information and obtain the translation from the infrared image center to the visible image center.Precise translation estimation is based on image features.Edge features are selected for good structure expression in multimodal images to ensure the accuracy and computation efficiency in registration.

Visible and Infrared Image Fusion
To meet the four requirements of UAV image fusion, namely, preserving color information, adding infrared brightness information, improving spatial resolution, and highlighting target areas, this study presents a new image fusion method based on NSCT and PCNN.The main features of the method include the following: 1.
The IHS transform is used to extract H and S to preserve the color information, and the NSCT multi-scale decomposition is designed to resolve the declining resolution of fusion images caused by the direct substitution of the I channel.

2.
The lowpass sub-band of the infrared image obtained via NSCT decomposition is processed by gray stretch to enhance the contrast between the target and the background and highlight the interesting areas.

3.
In view of the PCNN neuron with synchronous pulse and global coupling characteristics, which can realize automatic information transmission and fusion, an algorithm of visible and infrared bandpass sub-band fusion-based PCNN model is proposed.
The process of visible and infrared image fusion based on PCNN and NSCT is shown in Figure 5.
Remote Sens. 2017, 9, 441 9 of 29 transformation.We then attempt to detect the same name points of two images in the ground coordinate system through geographical information and obtain the translation from the infrared image center to the visible image center.Precise translation estimation is based on image features.Edge features are selected for good structure expression in multimodal images to ensure the accuracy and computation efficiency in registration.

Visible and Infrared Image Fusion
To meet the four requirements of UAV image fusion, namely, preserving color information, adding infrared brightness information, improving spatial resolution, and highlighting target areas, this study presents a new image fusion method based on NSCT and PCNN.The main features of the method include the following: 1.The IHS transform is used to extract H and S to preserve the color information, and the NSCT multi-scale decomposition is designed to resolve the declining resolution of fusion images caused by the direct substitution of the I channel.2. The lowpass sub-band of the infrared image obtained via NSCT decomposition is processed by gray stretch to enhance the contrast between the target and the background and highlight the interesting areas.3.In view of the PCNN neuron with synchronous pulse and global coupling characteristics, which can realize automatic information transmission and fusion, an algorithm of visible and infrared bandpass sub-band fusion-based PCNN model is proposed.
The process of visible and infrared image fusion based on PCNN and NSCT is shown in Figure 5.

Metadata
Metadata represents a type of telemetry data produced simultaneously with images in a UAV system.The most useful parameters are listed in Table 2.The parameter of terrain height is acquired from the geographic information system installed in a ground or airborne computer.Camera installation translations are measured with special equipment before flight.Other parameters come from airborne position and orientation sensors, such as GPS, INS, and altimeter.

Metadata
Metadata represents a type of telemetry data produced simultaneously with images in a UAV system.The most useful parameters are listed in Table 2.The parameter of terrain height is acquired from the geographic information system installed in a ground or airborne computer.Camera installation translations are measured with special equipment before flight.Other parameters come from airborne position and orientation sensors, such as GPS, INS, and altimeter.For image matching, one image should be scaled to the other.According to spatial geometry, the scale transformation M S is only related to the pixel size and focal length, which can be expressed as Equation ( 9) where s i and s v denote the pixel sizes of the infrared sensor and visible light sensor, respectively; and f i and f v represent the two focal lengths.Using M S , the infrared image I i (x i , y i ) could be transformed to the scale-transformed image I iS (x iS , y iS ), which is on the same plane of the visible image I v (x v , y v ), by employing Equation ( 10)

Metadata-Based Coarse Translation Estimation
Based on the theory of coordinate transformation [54,55], this section proposes a method for estimating the transformation between the visible image and the infrared image using image metadata.This estimation is coarse, but it could eliminate the global motion between the frames, reduce the matching range of image registration, and greatly improve the efficiency.

Five Coordinate Systems
Coordinate transformation is the key aspect in the whole process of coarse translation estimation.The following five coordinate systems are used as basis, as shown in Figure 6.According to different calculation modes, the value of I z could be set as the focal length of camera CCS is the image coordinate system represented by physical units with respect to the center of the image as the origin of the coordinate system, in which axis C X and axis C Y are parallel to axis I X and the axis I Y .Axis C Z is upward along the optical axis direction.In this system, the unit is generally in meters.
• Plane Coordinate System (PCS) The origin of the PCS is the center of the GPS device.In PCS, the direction of the axis P X is positive when it points to the head of the plane, axis P Y is perpendicular to axis p X on the body plane, and P Z is positive when it points upward.
The origin of the NCS is coincident with the origin of the PCS.The direction of axis N X is positive when it points north, the direction of axis N Y is positive when it points to the east, and axis N Z points up.
ICS is defined as a rectangular coordinate system, which is related to pixels.The top left corner of the image is considered the coordinate system origin.The values of x I , y I are related to the physical size of the row u and column v of the image.The relationship is established by pixel size s.According to different calculation modes, the value of z I could be set as the focal length of camera f or −f.
CCS is the image coordinate system represented by physical units with respect to the center of the image as the origin of the coordinate system, in which axis X C and axis Y C are parallel to axis X I and the axis Y I .Axis Z C is upward along the optical axis direction.In this system, the unit is generally in meters.
• Plane Coordinate System (PCS) O P − X P Y P Z P The origin of the PCS is the center of the GPS device.In PCS, the direction of the axis X P is positive when it points to the head of the plane, axis Y P is perpendicular to axis X p on the body plane, and Z P is positive when it points upward.
The origin of the NCS is coincident with the origin of the PCS.The direction of axis X N is positive when it points north, the direction of axis Y N is positive when it points to the east, and axis Z N points up.
The Gauss-Kruger surface projection is used in the GCS.The coordinate system (x G , y G ) is the plane rectangular coordinate system in which national mapping involves the use of Gauss-Kruger 3 • or 6 • to project and z G is the absolute altitude.The system consists of a rectangular space and a left-handed coordinate system.

Metadata-Based Coordinate Transformation
Based on the five coordinate systems, the transformation from image I I in the ICS to image I G in the GCS should be implemented according to the coordinate system transformation.The process is as follows: ICS → CCS → PCS → NCS → GCS.The transformations between the above coordinate systems present translations and rotations, which can be expressed as Equations ( 11) and (12), respectively.
where T x and T y are translation parameters; and α, β, and γ are the three rotation parameters of the X, Y, and Z axes.The coordinate transformations in our UAV system are listed in Table 3.They can be calculated with Equations ( 11) and ( 12) using relevant metadata.Assuming that any ground point in the ICS, NCS, and GCS could be denoted as (x I , y I , z I ), (x N , y N , z N ), and (x G , y G , z G ), respectively, and the imaging center O in the ICS, NCS, and GCS are denoted as , respectively, the values can be computed via coordinate transformation.Given that the NCS is parallel to the GCS, we can obtain the following formula using the collinear equation according to the central projection model shown in Equation (13).
Then, we can obtain any point transformation from the ICS to the GCS via Equations ( 14) and (15). where and Z I = − f .λ is a coefficient and could be eliminated during computation.f T represents the transformation from image I I in the ICS to image I G in the GCS.

Coordinate Transformation-Based Coarse Translation Estimation
Given the same mode of center projection, the coordinate transformation is applicable to both the visible image and infrared image.According to the inverse process of Equation ( 16), we can conveniently obtain the corresponding pixel positions in the visible image and infrared image of any point in the GCS.The overlapping image of the two sensors in the GCS could be denoted as I iv G (x iv G , y iv G ), and the corresponding visible image and infrared image in the ICS are denoted as I v I (x v I , y v I ) and I i I (x i I , y i I ), respectively.The following equation could then be established as Equation ( 16): where f Tv −1 and f Ti −1 represent the transform from the GCS to the ICS of the two sensors; they show different expressions because of the different parameters of the two sensors.Accordingly, the coarse translation estimation M Tc from the scale-transformed infrared image to the visible image can be calculated using Equation (17).
Based on the scale calculation in Section 2.3.2,M Tc can be considered as the translation from the center of the infrared scale-transformed image I iS I (x i I , y i I ) to the center of the original visible image I v I (x v I , y v I ).

Edge Detection of Visible and Infrared Images
According to current studies, line and edge are robust features for the good representation of scene structure information, and they are widely applied to scene registration and modeling.As described in a study on video analysis [56], line features play an important role in fast 3D camera modeling.In the present study, edge features are used in visible and infrared image registration.The Canny operator [57] is one of the most popular edge detection algorithms.As the scene and illumination of visible and infrared images change frequently, the high and low thresholds of the Canny operator often change thereby leading to poor self-adaptation.In many cases, the conventional Canny operator cannot obtain a satisfying detection result.In the present work, a self-adaptive threshold Canny operator is used to detect enough real edges and avoid disconnected or false edges in detection [58].

Edge Distance Field Transformation of Visible Image
As a result of different imaging mechanisms, the edge features of visible and infrared images show different characteristics.In the visible image, the edges appear relatively smooth, complete, and less noisy.In the infrared image, the edges appear to be incomplete, rough, and noisy, as shown in Figure 7.This characteristic indicates that the edges of the visible and infrared images are roughly the same.However, some details are slightly biased, and they could be defined as the non-strictly aligned characteristics of edges.
Remote Sens. 2017, 9, 441 14 of 29 Canny operator cannot obtain a satisfying detection result.In the present work, a self-adaptive threshold Canny operator is used to detect enough real edges and avoid disconnected or false edges in detection [58].

Edge Distance Field Transformation of Visible Image
As a result of different imaging mechanisms, the edge features of visible and infrared images show different characteristics.In the visible image, the edges appear relatively smooth, complete, and less noisy.In the infrared image, the edges appear to be incomplete, rough, and noisy, as shown in Figure 7.This characteristic indicates that the edges of the visible and infrared images are roughly the same.However, some details are slightly biased, and they could be defined as the non-strictly aligned characteristics of edges.To adapt to the non-strictly aligned characteristics of edges, this study proposes a new registration method based on a Gaussian distance field.This method can extend the edge range with a certain weight and convert the conventional edge-to-edge registration to the edge-to-field registration, which is effective for non-strict matching.
Using the edge detection algorithm of Section 2.4.1, we can extract the edge feature image ve I from the original visible image v I , with the edge pixel value being 255 and the non-edge pixel value being 0. In the edge feature image, the distance transformation of a point is defined as the distance To adapt to the non-strictly aligned characteristics of edges, this study proposes a new registration method based on a Gaussian distance field.This method can extend the edge range with a certain weight and convert the conventional edge-to-edge registration to the edge-to-field registration, which is effective for non-strict matching.
Using the edge detection algorithm of Section 2.4.1, we can extract the edge feature image I ve from the original visible image I v , with the edge pixel value being 255 and the non-edge pixel value being 0. In the edge feature image, the distance transformation of a point is defined as the distance from the nearest edge point to the point itself, as shown in Equation (18).
where d(p, p e ) represents the distance between two points of p in the distance field map of the visible image and p e in the visible edge image I ve .Given that the points away from the edge exert little effect on edge registration, distance transformation should only be performed in an edge-centered band region.Specifically, the band threshold is set to R, and the distance transformation values of all pixels larger than R are set to R + 1 via Equation (19).
In image matching, D(p) can be used to measure the similarity of the point in the infrared image and the point in the visible image.A small value equates to great matching probability, which could be expressed with a Gaussian model shown in Equation (20):

Similarity for Registration
Assuming that the template image to be registered iet I is extracted from the infrared edge image ie I , then the similarity between iet I and the corresponding template image veft I from the visible distance field map vef I can be expressed using Equation ( 21): where ( , ) p x y is any point in iet I , and f( ( )) D p is the function of the distance field transformation [59].
Infrared Template Image Extraction

Similarity for Registration
Assuming that the template image to be registered I iet is extracted from the infrared edge image I ie , then the similarity between I iet and the corresponding template image I veft from the visible distance field map I vef can be expressed using Equation ( 21): where p(x, y) is any point in I iet , and f(D(p)) is the function of the distance field transformation [59].

Infrared Template Image Extraction
Given that the edge distribution of the infrared image is unknown, the infrared template image I iet should be automatically extracted for matching.The position of I iet can be calculated using Equation ( 22): where N is the number of edge pixels in the infrared edge map I ie and (x, y) is any edge point.As shown in Figure 9, the width and height of I iet are defined as w and h, respectively.On the x-axis, the edge pixels of the interval [x iet − 0.5w, x iet + 0.5w] occupy a certain proportion of the total pixels of I ie .The edge pixels of the interval [y iet − 0.5h, y iet + 0.5h] account for the same proportion on the y-axis.Assuming that the template image to be registered iet I is extracted from the infrared edge image ie I , then the similarity between iet I and the corresponding template image veft I from the visible distance field map vef I can be expressed using Equation ( 21): where ( , ) p x y is any point in iet I , and f( ( )) D p is the function of the distance field transformation [59].

Infrared Template Image Extraction
Given that the edge distribution of the infrared image is unknown, the infrared template image iet I should be automatically extracted for matching.The position of iet I can be calculated using Equation ( 22): where N is the number of edge pixels in the infrared edge map ie I and (x, y) is any edge point.
As shown in Figure 9, the width and height of iet I are defined as w and h, respectively.On the x-axis, the edge pixels of the interval

Searching Algorithm Based on Particle Swarm Optimization
As shown in Figure 9, a searching algorithm is used to find the best matching position in the distance field map of visible edge I vef according to the similarity of the template image I iet and the template image I veft extracted from I vef .The time-consuming performance of the algorithm relative to conventional window searching should be improved, and the occasional accuracy deviation of the metadata attributed to the large motion of the UAV body or camera should be addressed.A novel searching algorithm with a time-varying inertia weight is proposed based on particle swarm optimization (PSO) [60,61].
PSO is a relatively new population-based evolutionary computation technique.This approach uses M particles to construct a group of particles and search for the optimal solution via iteration in the D dimensional space.Each particle comprises several parameters, including current position, velocity, and the best position found by the particles.For a D dimensional search space, these parameters are represented with D dimensional vectors.The position and velocity of the k particle are presented in Equation ( 23): x k = (x k1 , x k2 , ..., x kD ) At the n iteration step, the position and velocity of particle i are updated according to Equation (24).
where ω is the inertia weight; r 1 and r 2 are two distinct random values between 0 and 1; c 1 and c 2 are the acceleration constants known as cognitive and social scaling parameters, respectively; p i is the best previous position of the particle itself; and p g denotes the best previous position of all particles of the swarm.A large value of ω facilitates global exploration with increased diversity, whereas a small value promotes local exploitation [62].
In terms of image registration, x k (x k1 , x k2 ) is the center of image I veft , and p g is the searching result serving as the best matching position of image I iet and image I veft .As a result of the complex motion of medium-altitude UAVs and cameras, the translational motion between the visible image and the infrared image presents a certain vibration, which requires the search algorithm to automatically adjust the inertia weight ω.A time-varying ω is then proposed in Equation (25): where t represents the time of image capture.The first item ω 0 is the constant inertia weight, which denotes the confirmed global and local searching ability.The second item rω 1 is the stochastic inertia weight.This item could allow the algorithm to jump out of local optimization to maintain diversity and global exploration; r is a distinct random value between 0 and 1.The third item is the motion adaptive inertia weight to balance global searching and local searching according to the translation motion between the visible image and the infrared image.p t−1 g (x g1 (t − 1), x g2 (t − 1)) and p t−2 g (x g1 (t − 2), x g2 (t − 2)) are the two best previous positions of all particles of the swarm at moments t − 1 and t − 2, respectively.u v and v v are the row and column of the visible image, respectively.In this study, ω 0 = 0.5, and ω 1 = 0.2.
As the result of the searching algorithm, p t g (x g1 (t), x g2 (t)) is the best position at which the similarity of image I iet and image I veft is the highest.The precise translation from scale and the coarse translation-transformed infrared image to the visible image can then be expressed as Equation (26).

Simplified PCNN Model
PCNN is a type of feedback network used to explain the characteristics of the neurons in the visual cortex of a cat.As a result of synchronous pulse and global coupling, PCNN neurons can realize automatic information transmission and achieve good results in the field of image fusion.PCNN is connected by a number of neurons, and each neuron corresponds to a pixel of the image.Owing to the complexity of the original PCNN model, a simplified PCNN model [63] is adopted in this study.The mathematical equation is described in Equation (27).
where n denotes the iteration times.F ij (n), L ij (n), and Y ij (n) represent the feedback input, link input, and output of the (i, j) neuron in the n th iteration, respectively.I ij , U ij , and θ ij are the external input signal, internal activity term, and output of variable threshold function, respectively.β, W, V θ , a L , and a θ are the link strength, link weight coefficient matrix, threshold magnification factor, link input, and time decay constant, respectively.

NSCT-Based Image Decomposition
Nonsubsampled contourlet transformation (NSCT) is developed based on contourlet transformation.NSCT consists of two parts, namely, nonsubsampled pyramid filter banks (NSPFBs) and nonsubsampled directional filter banks (NSDFBs).NSPFBs enable NSCT to acquire multiscale characteristics.Through decomposition, the image can produce a lowpass subband and a bandpass subband, and then each decomposition level is iterated on the lowpass subband.A nonsubsampled directional filter bank (NSDFB) is a set of two channel nonsampled filter banks based on the sector directional filter bank designed by Bamberger and Smit [64].NSDFB can be used to carry out the level direction decomposition of the bandpass subband gained by the NSPFB and obtain the direction subband images with the same size as the original image.Three levels of NSCT transform are shown in Figure 10.The number of subbands in each direction increases by up to two times.

Fusion Algorithm
Based on PCNN and NSCT, the scheme of the visible and infrared image fusion algorithm is introduced in Section 2.2.3.The specific steps of the method are as follows.
1. IHS transform of visible image.
The IHS transform is used to preserve the color information of visible images, which could convert an image from the RGB color space to the IHS color space with the aid of Equations ( 28)-( 30):

Fusion Algorithm
Based on PCNN and NSCT, the scheme of the visible and infrared image fusion algorithm is introduced in Section 2.2.3.The specific steps of the method are as follows.

IHS transform of visible image.
The IHS transform is used to preserve the color information of visible images, which could convert an image from the RGB color space to the IHS color space with the aid of Equations ( 28)-( 30): where I denotes intensity, H denotes hue, and S denotes saturation.H and S are preserved for finial IHS inverse transform, and I is used to fuse with the infrared image.

NSCT transform of infrared image and I channel of visible image.
As the infrared sensor and visible light sensor can zoom individually, the spatial resolution of the infrared image may be lower than that of the visible light image.Thus, the method of directly replacing the I channel of the visible image with the infrared image may cause the spatial resolution of the fusion image to decline.The NSCT multi-scale decomposition is used to solve this problem.The gray image (8 bit) of the infrared image and the I channel (8 bit) of the visible image are decomposed by three levels through the NSCT transform.One image can be decomposed into one lowpass sub-band and some bandpass subbands.The lowpass represents the outline of the original image, and the bandpass sub-bands represent the edges and textures of the image.

Enhancement of lowpass subband of infrared image
Based on NSCT transform, the lowpass subband of the infrared image is processed via histogram equalization to enhance the contrast between the target and the background and to highlight the interesting areas.

Lowpass subband fusion
During the lowpass sub-band fusion of the visible light and infrared image, the coefficients are selected according to the principle of the maximum absolute value.

Bandpass sub-band fusion
The bandpass sub-band fusion of the visible light and infrared image is based on PCNN.The method chooses the regional energy that can reflect the local phase characteristics of the image as the link strength β of the neuron.Assuming that (i, j) is the center of the region size of M × N, the regional energy E k ij is expressed as Equation ( 31): where D k ij represents the bandpass subband coefficient of the kth level at (i, j) of the image.6. NSCT inverse transform using fusion lowpass subband and fusion bandpass sub-band New fusion lowpass sub-band and bandpass subbands are generated based on Equations ( 4) and (5).Then, a new I channel can be obtained according to the NSCT inverse transform.

IHS inverse transform using H channel, S channel, and new I channel
Using the new I channel and the preserved H channel and S channel, the fusion image of the RGB color space can be calculated with Equations ( 32)-( 34):

Study Area and Dataset
The study area is located inland in Eastern China, as shown in Figure 11.The main types of landforms include cities, villages, and open fields.After performing a number of flights, a database that includes one hundred hours of visible light and infrared videos and metadata was established.
where ij D represents the bandpass subband coefficient of the kth level at ( , ) i j of the image.
6. NSCT inverse transform using fusion lowpass subband and fusion bandpass sub-band New fusion lowpass sub-band and bandpass subbands are generated based on Equations ( 4) and (5).Then, a new I channel can be obtained according to the NSCT inverse transform.

IHS inverse transform using H channel, S channel, and new I channel
Using the new I channel and the preserved H channel and S channel, the fusion image of the RGB color space can be calculated with Equations ( 32)-( 34):

Study Area and Dataset
The study area is located inland in Eastern China, as shown in Figure 11.The main types of landforms include cities, villages, and open fields.After performing a number of flights, a database that includes one hundred hours of visible light and infrared videos and metadata was established.

Spatial Geometry-Based Scale Calculation
According to Section 2.3.2, the scale transformation from the infrared image to the visible image is determined by pixel size and focal length of the two sensors.In the visible light and infrared integrated camera, the focal length of the visible light sensor can be varied continuously in a certain range, whereas the focal length of the infrared sensor has only two fixed values of 540 mm and 135 mm.In this section, three experiments with different focal lengthsare designed to test the performance of the spatial geometry-based scale calculation.The source data are shown in Table 4, and the results are shown in Table 5 and Figures 12-14.and the results are shown in Table 5 and Figures 12-14.and the results are shown in Table 5 and Figures 12-14.
According to the fusion results, the two images maintain consistency in shape and size, as indicated by the clarity and lack of aliasing in the overlapping pixels.This result proves the validity of the spatial geometry-based scale calculation.

Coordinate Transformation-Based Coarse Translation Estimation
After scale calculation, the infrared image is converted to the same plane of the visible image.According to Section 2.4, coarse translation estimation can calculate the translation M Tc from the infrared scale-transformed image I iS to the original visible image I v .Then, the infrared image after coarse translation transformation can be obtained with Equation (36).
Figure 15 shows the fusion image of the coarse translation-transformed infrared image I iST c and the original visible image I v obtained with Equation (36).
According to the fusion results, the two images maintain consistency in shape and size, as indicated by the clarity and lack of aliasing in the overlapping pixels.This result proves the validity of the spatial geometry-based scale calculation.

Coordinate Transformation-Based Coarse Translation Estimation
After scale calculation, the infrared image is converted to the same plane of the visible image.As shown in Figure 15 and Table 6, the coarse translation shows a positive effect on the registration of the infrared image and visible image, but the result fails to reach high levels of accuracy.Moreover, the error has some fluctuations.As shown in Figure 15 and Table 6, the coarse translation shows a positive effect on the registration of the infrared image and visible image, but the result fails to reach high levels of accuracy.Moreover, the error has some fluctuations.

Image Edge-Based Translation Estimation
Precise translation estimation is performed based on image edge features to achieve an accurate registration.In such estimation, the coarse translation-transformed infrared image I iSTc is converted to the precise translation-transformed image I iST c T p with Equation (37).
where M Tp can be obtained following the description in Section 2.5.
Figure 16 shows the fusion image of the precise translation-transformed infrared image I iST c T p and the original visible image I v obtained with Equation (37).Comparing Figures 15 and 16 indicates that the fusion image based on precise translation is better than the fusion image based on coarse translation because of its clear edges in the overlapping region and absence of aliasing.As indicated in Table 7, image registration accuracy is significantly improved.When the spatial resolution of the infrared image (Figure 17b) is low, the method of directly replacing the I channel of the visible image (Figure 17a) with the infrared image causes the spatial resolution of the fusion image to decline (Figure 17c).The proposed NSCT-and PCNN-based method can generate a fusion image with satisfactory spatial resolution (Figure 17d).As shown in Figure 17, the spatial resolution of Figure 17 dis higher than that of Figure 17c.

Fusion of Interesting Areas
Another important purpose of image fusion is to highlight target information.Figure 18 shows the saliency analysis between the original image and the fusion image in two scenes.Figure 18a,b,d Comparing Figures 15 and 16 indicates that the fusion image based on precise translation is better than the fusion image based on coarse translation because of its clear edges in the overlapping region and absence of aliasing.As indicated in Table 7, image registration accuracy is significantly improved.When the spatial resolution of the infrared image (Figure 17b) is low, the method of directly replacing the I channel of the visible image (Figure 17a) with the infrared image causes the spatial resolution of the fusion image to decline (Figure 17c).The proposed NSCT-and PCNN-based method can generate a fusion image with satisfactory spatial resolution (Figure 17d).As shown in Figure 17, the spatial resolution of Figure 17      Comparing Figures 15 and 16 indicates that the fusion image based on precise translation is better than the fusion image based on coarse translation because of its clear edges in the overlapping region and absence of aliasing.As indicated in Table 7, image registration accuracy is significantly improved.When the spatial resolution of the infrared image (Figure 17b) is low, the method of directly replacing the I channel of the visible image (Figure 17a) with the infrared image causes the spatial resolution of the fusion image to decline (Figure 17c).The proposed NSCT-and PCNN-based method can generate a fusion image with satisfactory spatial resolution (Figure 17d).As shown in Figure 17, the spatial resolution of Figure 17 dis higher than that of Figure 17c.

Fusion of Interesting Areas
Another important purpose of image fusion is to highlight target information.Figure 18 shows the saliency analysis between the original image and the fusion image in two scenes.Figure 18a,b,d,e

Fusion of Interesting Areas
Another important purpose of image fusion is to highlight target information.Figure 18 shows the saliency analysis between the original image and the fusion image in two scenes.Figure 18a,b,d,e shows the original images.Figure 18c,f shows the fusion results of the proposed method.The yellow frame area represents the low salient areas in the visible image.The fusion results show that these areas become increasingly salient.
shows the original images.Figure 18c,f shows the fusion results of the proposed method.The yellow frame area represents the low salient areas in the visible image.The fusion results show that these areas become increasingly salient.

Performance Analysis of Image Registration
In the performance test experiments, we choose 257 groups of images and corresponding metadata with three typical types of motions: translation, rotation, and scale.Based on the result of the scale transformation, we tested the performance of the five methods: the proposed method of integrated parallel vision-based registration (IPVBR), alignment metric-based registration (AMBR) [32], mutual information-based registration (MIBR) [16], peak signal-to-noise ratio-based registration (PSNRBR), and structural similarity-based registration (SSIMBR).PSNRBR and SSIMBR are two registration methods that use PSNR and SSIM as the similarity standard [65].
Under each motion condition, the values of root mean square error (RMSE) are calculated using Equation (38):   3.6.Performance Analysis

Performance Analysis of Image Registration
In the performance test experiments, we choose 257 groups of images and corresponding metadata with three typical types of motions: translation, rotation, and scale.Based on the result of the scale transformation, we tested the performance of the five methods: the proposed method of integrated parallel vision-based registration (IPVBR), alignment metric-based registration (AMBR) [32], mutual information-based registration (MIBR) [16], peak signal-to-noise ratio-based registration (PSNRBR), and structural similarity-based registration (SSIMBR).PSNRBR and SSIMBR are two registration methods that use PSNR and SSIM as the similarity standard [65].
Under each motion condition, the values of root mean square error (RMSE) are calculated using Equation (38):      The average RMSE values of the five methods in the three experiments are shown in Table 8.As shown in Figures 19-21, the RMSE curve of IPVBR remains stable and low.The four other curves present different performances.The curve of SSIMBR presents good performance in Experiments 2 and 3, but it shows high vibration in Experiment 1.The curve of PSNRBR always maintains a certain vibration in Experiments1 and 3.The curve of AMBR indicates some high errors in Experiment 2 and presents high vibrations in Experiments 1 and 3.The curve of MIBR shows no good or bad performance.As shown in Table 8, the proposed IPVBR achieves the minimum average RMSE in the three experiments.SSIMBP also has a low average RMSE, along with IPVBR.
Three points can be concluded from these three experiments.
1. Compared with the four other methods, the proposed IPVBR presents a stable and low MSER.This result shows the high stability and precision of the proposed method.2. SSIMBP is better than PSNRBP, which indicates that structure information is more reliable than pixel information for multimodal image registration.The average RMSE values of the five methods in the three experiments are shown in Table 8.As shown in Figures 19-21, the RMSE curve of IPVBR remains stable and low.The four other curves present different performances.The curve of SSIMBR presents good performance in Experiments 2 and 3, but it shows high vibration in Experiment 1.The curve of PSNRBR always maintains a certain vibration in Experiments1 and 3.The curve of AMBR indicates some high errors in Experiment 2 and presents high vibrations in Experiments 1 and 3.The curve of MIBR shows no good or bad performance.As shown in Table 8, the proposed IPVBR achieves the minimum average RMSE in the three experiments.SSIMBP also has a low average RMSE, along with IPVBR.
Three points can be concluded from these three experiments.
1. Compared with the four other methods, the proposed IPVBR presents a stable and low MSER.This result shows the high stability and precision of the proposed method.2. SSIMBP is better than PSNRBP, which indicates that structure information is more reliable than pixel information for multimodal image registration.The average RMSE values of the five methods in the three experiments are shown in Table 8.As shown in Figures 19-21, the RMSE curve of IPVBR remains stable and low.The four other curves present different performances.The curve of SSIMBR presents good performance in Experiments 2 and 3, but it shows high vibration in Experiment 1.The curve of PSNRBR always maintains a certain vibration in Experiments1 and 3.The curve of AMBR indicates some high errors in Experiment 2 and presents high vibrations in Experiments 1 and 3.The curve of MIBR shows no good or bad performance.As shown in Table 8, the proposed IPVBR achieves the minimum average RMSE in the three experiments.SSIMBP also has a low average RMSE, along with IPVBR.
Three points can be concluded from these three experiments.

1.
Compared with the four other methods, the proposed IPVBR presents a stable and low MSER.This result shows the high stability and precision of the proposed method.2.
SSIMBP is better than PSNRBP, which indicates that structure information is more reliable than pixel information for multimodal image registration.

3.
The two representative conventional methods of AMBR and MIBR fail to achieve good results under the three motion conditions for medium-altitude UAV applications.
Three experiments are conducted based on the fact that all five algorithms can obtain nearly correct results.In some cases, the compared image-based algorithms fail to solve the perspective transform, and the proposed edge feature extraction and matching method is effective in translation calculation.At this point, the result reflects the obvious advantages of the proposed method.

Performance Analysis of Image Fusion
To analyze the performance, this study introduces three other methods: IHS transform-based fusion (IHSBF), PCA-based fusion (PCABF) [66], and SIDWT-based fusion (SIDWTBF) [67].These methods are compared with the proposed method in the experiment.
Using 10 sets of visible and infrared images of different scenes as the experiment data, we select the average gradient (Equation ( 39)) and Shannon value (Equation ( 40)) as the evaluation indexes of the four methods.The average gradient can sensitively reflect the ability of the image to express the smallest details and can be used to evaluate the clarity of the image.A high average gradient equates to a clear image.A high Shannon value equates to a large amount of information in the image: where f (x, y) denotes the pixel value at (x, y) and M × N denotes the image resolution.
where i represents a sample in the image and P i represents the probability of the sample.
The average gradient and Shannon results are shown in Figure 22, and the average values of the four image fusion methods are listed in Table 9.
3. The two representative conventional methods of AMBR and MIBR fail to achieve good results under the three motion conditions for medium-altitude UAV applications.
Three experiments are conducted based on the fact that all five algorithms can obtain nearly correct results.In some cases, the compared image-based algorithms fail to solve the perspective transform, and the proposed edge feature extraction and matching method is effective in translation calculation.At this point, the result reflects the obvious advantages of the proposed method.

Performance Analysis of Image Fusion
To analyze the performance, this study introduces three other methods: IHS transform-based fusion (IHSBF), PCA-based fusion (PCABF) [66], and SIDWT-based fusion (SIDWTBF) [67].These methods are compared with the proposed method in the experiment.
Using 10 sets of visible and infrared images of different scenes as the experiment data, we select the average gradient (Equation ( 39)) and Shannon value (Equation ( 40)) as the evaluation indexes of the four methods.The average gradient can sensitively reflect the ability of the image to express the smallest details and can be used to evaluate the clarity of the image.A high average gradient equates to a clear image.A high Shannon value equates to a large amount of information in the image: where i represents a sample in the image and i P represents the probability of the sample.
The average gradient and Shannon results are shown in Figure 22, and the average values of the four image fusion methods are listed in Table 9.As shown in Figure 22a,b, the two group curves of our method are high and stable.Table 9 shows that the average values of our method are higher than those of the other three methods.The results also show that the fusion image obtained by our method has higher contrast, better details, and more information than the images obtained with the other methods.

Conclusions
Visible and infrared image registration is a difficult problem in medium-altitude UAVs because of different imaging mechanisms, poor image quality, and large amounts of motion in videos.For the  As shown in Figure 22a,b, the two group curves of our method are high and stable.Table 9 shows that the average values of our method are higher than those of the other three methods.The results also show that the fusion image obtained by our method has higher contrast, better details, and more information than the images obtained with the other methods.

Conclusions
Visible and infrared image registration is a difficult problem in medium-altitude UAVs because of different imaging mechanisms, poor image quality, and large amounts of motion in videos.For the special requirements of UAV applications, an appropriate image fusion method becomes a key technology.
This study proposed a novel image registration method that uses both metadata and image based on the imaging characteristic analysis of the most common visible light and infrared integrated camera.The main contributions of this work are reflected in three aspects.First, we reveal the principle of long-distance integrated parallel vision, which provides the theoretical foundation of the conversion from a perspective transformation to scale and translation transformations.Second, two new algorithms for scale calculation and coarse translation estimation are presented using the image metadata of the UAV system according to spatial geometry and coordinate transformation.Third, an edge distance field-based registration is proposed in precise translation estimation to solve the non-strict edge alignment of the visible image and infrared image.A searching algorithm based on PSO is also put forward to improve efficiency.In image fusion, this study designs a new method based on PCNN and NSCT.This method can meet the four requirements of preserving color information, adding infrared brightness information, improving spatial resolution, and highlighting target areas for UAV applications.
A medium-altitude UAV is employed to collect experimental data, including three typical groups of translation, rotation, and scale.Results show that the proposed method achieves encouraging performance in image registration and fusion.These results can be applied to other medium-altitude or high-altitude UAVs with a similar system structure.However, future work should focus on analysis and experiments, such as the improved transformation of edge distance field and real time optimization of image fusion.

1 .
Medium-Altitude UAV and Multi-Sensor-Based Remote Sensing

Figure 2 .
Figure 2. UAV airborne visible light and infrared integrated camera platform with two degrees of freedom.

Figure 2 .
Figure 2. UAV airborne visible light and infrared integrated camera platform with two degrees of freedom.

Figure 3 .
Figure 3. Visible light and infrared integrated camera, in which the two imaging axes are parallel to each other.
g g g g a b c d represents the FOV of the two sensors, and line g g b c is the common FOV.v f and i f are the focal lengths of the two sensors.v O and i O are the two foci.a D is the distance between two imaging axes.vg D and ig D denote the distances from the image plane to the ground.Based on the pinhole imaging principle, Equations (

Figure 3 .
Figure 3. Visible light and infrared integrated camera, in which the two imaging axes are parallel to each other.

M
is the coarse registration matrix from the visible image to the infrared image based on metadata and Tp M is the precise registration matrix based on the image matching method.

Figure 4 .
Figure 4. Process of visible and infrared image registration, including scale calculation, coarse translation estimation, and precise translation estimation.

Figure 4 .
Figure 4. Process of visible and infrared image registration, including scale calculation, coarse translation estimation, and precise translation estimation.

Figure 5 .
Figure 5. Process of visible and infrared image fusion based on PCNN and NSCT.

Figure 5 .
Figure 5. Process of visible and infrared image fusion based on PCNN and NSCT.

Figure 6 .
Figure 6.Five coordinate systems of coarse translation estimation.

Figure 6 .
Figure 6.Five coordinate systems of coarse translation estimation.

Figure 7 .
Figure 7. Non-strictly aligned characteristics of edges: (a) original visible image; (b) original infrared image; (c) visible edge image; and (d) infrared edge image.

Figure 7 .
Figure 7. Non-strictly aligned characteristics of edges: (a) original visible image; (b) original infrared image; (c) visible edge image; and (d) infrared edge image.

( 20 )
where f(D(p)) represents the matching probability.Standard deviation is set to σ = R/3.In this paper, R = 10, which could be different in specific situations.Based on Equation (20), the distance field map I vef of the visible image is established, as shown in Figure 8. Remote Sens. 2017, 9, 441 15 of 29

Figure 8 .
Figure 8. Edge distance field transformation based on Gaussian: (a) visible edge image; and (b) distance field map of visible edge.

Figure 8 .
Figure 8. Edge distance field transformation based on Gaussian: (a) visible edge image; and (b) distance field map of visible edge.

Figure 8 .
Figure 8. Edge distance field transformation based on Gaussian: (a) visible edge image; and (b) distance field map of visible edge.
the same proportion on the y-axis.

Figure 9 .
Figure 9. Infrared template image extraction and template image searching in the distance field map of visible edge: (a) infrared edge image; and (b) distance field map of visible edge.

Figure 9 .
Figure 9. Infrared template image extraction and template image searching in the distance field map of visible edge: (a) infrared edge image; and (b) distance field map of visible edge.

Figure 10 .
Figure 10.NSPFB and NSDFB of NSCT transform.The left-hand portion is the image decomposition based on NSPFB.The right-hand portion shows the decomposition of each subband in different directions based on NSDFB.

Figure 10 .
Figure 10.NSPFB and NSDFB of NSCT transform.The left-hand portion is the image decomposition based on NSPFB.The right-hand portion shows the decomposition of each subband in different directions based on NSDFB.

Figure 11 .
Figure 11.Study area and flight course covering about 300 km 2 in Eastern China.Figure 11.Study area and flight course covering about 300 km 2 in Eastern China.

Figure 11 .
Figure 11.Study area and flight course covering about 300 km 2 in Eastern China.Figure 11.Study area and flight course covering about 300 km 2 in Eastern China.

Figure 12 .
Figure 12.First experiment of scale calculation: (a) original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).

Figure 13 .
Figure 13.Second experiment of scale calculation.(a) Original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).

Figure 14 .
Figure 14.Third experiment of scale calculation: (a) original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).

Figure 12 .
Figure 12.First experiment of scale calculation: (a) original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).

Figure 12 .
Figure 12.First experiment of scale calculation: (a) original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).

Figure 13 .
Figure 13.Second experiment of scale calculation.(a) Original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).

Figure 14 .
Figure 14.Third experiment of scale calculation: (a) original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).

Figure 13 .
Figure 13.Second experiment of scale calculation.(a) Original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).

Figure 12 .
Figure 12.First experiment of scale calculation: (a) original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).

Figure 13 .
Figure 13.Second experiment of scale calculation.(a) Original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).

Figure 14 .
Figure 14.Third experiment of scale calculation: (a) original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).Figure 14.Third experiment of scale calculation: (a) original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).

Figure 14 .
Figure 14.Third experiment of scale calculation: (a) original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).Figure 14.Third experiment of scale calculation: (a) original image; (b) original infrared image; (c) scale-transformed result of image (b); and (d) fusion image of images (a) and (c).

According to Section 2 . 4 ,
coarse translation estimation can calculate the translation Tc M from the infrared scale-transformed image iS I to the original visible image v I .Then, the infrared image after coarse translation transformation can be obtained with Equation (36).

Figure 15
Figure 15 shows the fusion image of the coarse translation-transformed infrared image

Figure 15 .
Figure 15.Fusion image of coarse translation-transformed infrared image and original visible image: (a) first experiment image; (b) second experiment image; and (c) third experiment image.

Figure 15 .
Figure 15.Fusion image of coarse translation-transformed infrared image and original visible image: (a) first experiment image; (b) second experiment image; and (c) third experiment image.
following the description in Section 2.5.

Figure 16 I
Figure16shows the fusion image of the precise translation-transformed infrared image

Figure 16 .
Figure 16.Fusion image of the precise translation-transformed infrared image and the original visible image: (a) first experiment image; (b) second experiment image; and (c) third experiment image.

Figure 17 .
Figure 17.Fusion of visible image and low spatial infrared image: (a) Visible image; (b) infrared image; (c) fusion image based on IHS; and (d) fusion image based on the proposed method.

,eFigure 16 .
Figure 16.Fusion image of the precise translation-transformed infrared image and the original visible image: (a) first experiment image; (b) second experiment image; and (c) third experiment image.

=
following the description in Section 2.5.

Figure 16 I
Figure16shows the fusion image of the precise translation-transformed infrared image

Figure 16 .
Figure 16.Fusion image of the precise translation-transformed infrared image and the original visible image: (a) first experiment image; (b) second experiment image; and (c) third experiment image.

Figure 17 .
Figure 17.Fusion of visible image and low spatial infrared image: (a) Visible image; (b) infrared image; (c) fusion image based on IHS; and (d) fusion image based on the proposed method.

Figure 17 .
Figure 17.Fusion of visible image and low spatial infrared image: (a) Visible image; (b) infrared image; fusion image based on IHS; and (d) fusion image based on the proposed method.

Figure 18 .
Figure 18.Fusion of interesting areas in two scenes: (a,b,d,e) original image; and (c,f) fusion image based on the proposed method.
error i E denotes the pixel distance from the corresponding calculated matching point ( , ) c c x y to the actual matching point ( , ) a a x y in the visible image.The error analysis results of the three experiments are shown in Figures 19-21.

Figure 18 .
Figure 18.Fusion of interesting areas in two scenes: (a,b,d,e) original image; and (c,f) fusion image based on the proposed method.
) where the measurement error E i denotes the pixel distance from the corresponding calculated matching point (x c , y c ) to the actual matching point (x a , y a ) in the visible image.The error analysis results of the three experiments are shown in Figures 19-21.

Figure 19 .
Figure 19.Performance analysis of the first experiment under translation conditions.Figure 19.Performance analysis of the first experiment under translation conditions.

Figure 19 .
Figure 19.Performance analysis of the first experiment under translation conditions.Figure 19.Performance analysis of the first experiment under translation conditions.

Figure 19 .
Figure 19.Performance analysis of the first experiment under translation conditions.

Figure 20 .
Figure 20.Performance analysis of the second experiment under rotation conditions.

Figure 21 .
Figure 21.Performance analysis of the third experiment under scale conditions.

Figure 20 .
Figure 20.Performance analysis of the second experiment under rotation conditions.

Figure 19 .
Figure 19.Performance analysis of the first experiment under translation conditions.

Figure 20 .
Figure 20.Performance analysis of the second experiment under rotation conditions.

Figure 21 .
Figure 21.Performance analysis of the third experiment under scale conditions.

Figure 21 .
Figure 21.Performance analysis of the third experiment under scale conditions.
denotes the pixel value at ( , ) x y and × M N denotes the image resolution.

Figure 22 .
Figure 22.Average gradient and Shannon values of the four image fusion methods: (a) average gradient; and (b) Shannon value.

Figure 22 .
Figure 22.Average gradient and Shannon values of the four image fusion methods: (a) average gradient; and (b) Shannon value.

Table 1 .
Main parameters of employed medium-altitude UAV.

Table 1 .
Main parameters of employed medium-altitude UAV.

Table 3 .
Coordinate transformations and relevant metadata.
NTranslation of coordinate system center L, B, H a , H g

Table 4 .
Source data for scale calculation.

Table 5 .
Infrared image after scale transformation.

Table 4 .
Source data for scale calculation.

Table 5 .
Infrared image scale transformation.

Table 4 .
Source data for scale calculation.

Table 5 .
Infrared image after scale transformation.

Table 4 .
Source data for scale calculation.

Table 5 .
Infrared image after scale transformation.

Table 6 .
Results of coarse translation estimation.

Table 6 .
Results of coarse translation estimation.

Table 7 .
Results of precise translation estimation.

Table 7 .
Results of precise translation estimation.

Table 7 .
Results of precise translation estimation.

Table 8 .
Average RMSE of the five methods.

Table 8 .
Average RMSE of the five methods.

Table 8 .
Average RMSE of the five methods.

Table 9 .
Average gradient and Shannon value of the four methods.

Table 9 .
Average gradient and Shannon value of the four methods.