Four-Band Thermal Mosaicking: A New Method to Process Infrared Thermal Imagery of Urban Landscapes from UAV Flights

: Unmanned aerial vehicles (UAVs) support a large array of technological applications and scientiﬁc studies due to their ability to collect high-resolution image data. The processing of UAV data requires the use of mosaicking technology, such as structure-from-motion, which combines multiple photos to form a single image mosaic and to construct a 3-D digital model of the measurement target. However, the mosaicking of thermal images is challenging due to low lens resolution and weak contrast in the single thermal band. In this study, a novel method, referred to as four-band thermal mosaicking (FTM), was developed in order to process thermal images. The method stacks the thermal band obtained by a thermal camera onto the RGB bands acquired on the same ﬂight by an RGB camera and mosaics the four bands simultaneously. An object-based calibration method is then used to eliminate inter-band positional errors. A UAV ﬂight over a natural park was carried out in order to test the method. The results demonstrated that with the assistance of the high-resolution RGB bands, the method enabled successful and e ﬃ cient thermal mosaicking. Transect analysis revealed an inter-band accuracy of 0.39 m or 0.68 times the ground pixel size of the thermal camera. A cluster analysis validated that the thermal mosaic captured the expected contrast of thermal properties between di ﬀ erent surfaces within the scene.


Introduction
Unmanned aerial vehicles (UAVs) have become a reliable observation platform for environmental remote sensing applications, including wildfire mapping [1,2], atmospheric studies [3][4][5], precision agriculture [6][7][8] and plant identification [9][10][11].Compared to the traditional remote sensing platforms such as satellites and manned aircrafts, which have low spatiotemporal resolutions and high operational costs [12], light-weight UAVs are less costly and more flexible.In spite of some limitations related to weather tolerance, maximum payload, civil aviation regulation, and traveling distance [13], UAVs can collect high temporal resolution data at ultra-high spatial resolutions [14] and are versatile due to their ability to change speed, direction and elevation as required [15].Benefitting from the advantages of the UAV platform, UAV photogrammetry and remote sensing have opened new opportunities for landscape visualization and analysis.Lucieer et al. mapped the landslide displacement using high-resolution and multi-temporal UAV photos [16].Techniques of 3-D modelling, like structure-from-motion (SfM) [17], were combined to process the acquired UAV photos and to construct the digital elevation model (DEM).The landscape of an urban area can be investigated in detail with the acquisition of multi-band UAV imagery [18], which can yield more valuable information than satellite imagery.
The individual photos collected by a UAV camera should be mosaicked for remote sensing applications.Image mosaicking is a technique that combines multiple image patches to form a larger image [19].To date, it has been integrated with other sophisticated spatial analysis and detection technologies, such as undersea exploration [20,21], high-resolution satellite imaging [22], fingerprint registration [23], 3-D visualization [19,24], and microscopic observation [25].The past two decades have witnessed the rapid growth of 'structure-from-motion' (SfM), a revolutionary, low-cost, user-friendly photogrammetric technique for image mosaicking [26].SfM extracts common features in different photos using the scale invariant feature transform (SIFT) algorithm.Then, with the help of a bundler algorithm [27], a sparse point cloud is generated, which is subsequently densified by the clustering view for multi-view stereo (CMVS) algorithm [28].The patch-based multi-view stereo (PMVS2) algorithm [28] is finally used to reconstruct a 3-D model of the measurement target.One important advantage of SfM over other methods is its high automation.The whole process of SfM can be completed without prior knowledge of camera pose and its 3-D location [26].SfM has become increasingly user-friendly since the release of SfM-based commercial software, such as Agisoft Photoscan (https://www.agisoft.com)and Pix4D (https://www.pix4d.com).As part of the workflow in these software products, multiple photos are stitched together to form an orthographically projected map called orthomosaic.
The quality of an orthomosaic can be improved by other effective algorithms, such as speeded up robust features (SURF), oriented FAST and rotated BRIEF (ORB), and robust features along the edge (RFAE) [29].Meanwhile, as narrow-band multispectral cameras (e.g., mini-MCA6) and thermal cameras (e.g., FLIR Vue Pro) are increasingly used on board UAVs, image mosaicking is no longer limited to RGB photos.Thermal maps have important merits in different fields of science.For example, thermal maps can be combined with geomorphologic data to monitor potential volcanic unrest in a 3-D view [30], or used to reconstruct the thermal property of plant canopies [31].Other applications include assisting the model calculation of surface energy balance [32] and surveying land use and local climate [33][34][35].
So far, we lack efficient procedures for processing thermal images.Often, thermal images are used as individual frames and without the clear characterization of positional accuracy.Mosaicking UAV thermal infrared (TIR) imagery has usually been not as successful as RGB and multispectral imagery for the following reasons:

1.
As opposed to RGB cameras and multispectral cameras, thermal cameras available on the market usually have low-resolution lenses [36].The resulting coarse-grid raster imagery is poor in texture and poses a challenge for key-point identification.The chance for SfM workflow to fail at sparse point generation, the first step of the workflow, is high [37].

2.
RGB and multispectral cameras both provide multi-band images.With the assistance of multi-band contrast, there is a high tolerance of failed point match.However, most of the thermal data are single-band images, which lowers the probability of finding matched points correctly.

3.
To compensate for the downsides mentioned above, UAV flights to collect thermal images are usually carried out separately at a lower elevation and at a slower speed compared to the flight for other bands [36,38].The separated flight enables a closer watch on the objects, increases photo overlaps, decreases the pixel size and improves image resolution.In addition, auxiliary data such as camera global positioning system (GPS) data and ground control points (GCPs) are usually more indispensable during thermal mosaicking than RGB mosaicking [18,39,40].However, planning a separated flight mission and setting GCP boards on the ground [31,38] beforehand are both labor intensive, not to mention that the thermal band will have a smaller sampled area compared to the RGB band.Moreover, not all types of thermal camera have built-in GPS data, and some camera types are incompatible with the UAV to synchronize its GPS data.Even though the thermal mosaicking without geotag can be successful, it is still challenging to register, with sufficient positional accuracy, the thermal mosaic to the RGB imagery of the same target for proper interpretation of the former [29].
Motivated by these challenges and the importance of thermal mosaicking, this work aims to design a novel method, referred to as the four-band thermal mosaicking (FTM), in order to process thermal images acquired by a UAV, with three goals in mind:

1.
The method overcomes the difficulty of mosaicking low-resolution, single-band thermal imagery.
The flight preparation and the mosaicking process in the SfM-based applications are no more complicated than those used for RGB photos.The method does not require either onboard GPS or GCP data.

2.
The final product, the thermal orthomosaic, can be easily registered to the RGB orthomosaic of the same target.There is no loss of the sampled area in the thermal orthomosaic, and the method allows pixel-by-pixel analyses between the thermal and the RGB bands.

3.
The method provides a simple and robust way to establish relative positional errors and to validate the temperature map.

An Overview of Four-band Thermal Mosaicking (FTM)
Inspired by [14] which shows how multi-band images are mosaicked simultaneously, we seek to combine each thermal photo with its synchronized RGB photo in order to form a four-band image.The four-band images are then imported to the Pix4D software for processing.In the SfM algorithm, the "tie points" are identified using only the RGB bands; no weight is given to the thermal band.To generate the thermal orthomosaic, the software simply combines and projects the thermal band in the same way as the RGB orthomosaic is projected from the tie points.The final four-band large orthomosaic is saved in the TIFF format.The band sequence is red, green, blue and thermal infrared.All the bands have the same pixel numbers and the same pseudo resolution.

Study Area
Our study area is Beaver Pond Park (Figure 1; center coordinate 41.33 N, 72.94 W), in the city of New Haven, Connecticut, United States.The study area covered by the UAV observation has multiple surface types, including trees, grassland, bare soil, water body, and asphalt road.The contrasting surface characteristics are helpful to recognize prominent features on the temperature map due to large temperature differences.
Our study area is Beaver Pond Park (Figure 1; center coordinate 41.33 N, 72.94 W), in the city of New Haven, Connecticut, United States.The study area covered by the UAV observation has multiple surface types, including trees, grassland, bare soil, water body, and asphalt road.The contrasting surface characteristics are helpful to recognize prominent features on the temperature map due to large temperature differences.

Instruments
We used a DJI Phantom 4 Pro Quadcopter (Figure 2a, Table 1) as our UAV platform.Its built-in RGB camera was not used; instead a FLIR DUO R dual-sensor RGB/thermal camera (Figure 2b, Table 2) was attached to the UAV to acquire photos of the landscape.During the flight over the target area (the green box in Figure 1), the FLIR camera took photos in the nadir direction, in its burst mode allowing consecutive capture of synchronized RGB and thermal images at one pair of photos per second.The RGB photos were saved in the 8-bit JPG format and the thermal images in the 16-bit TIFF format.The resolution of the visible lens (1920 by 1080 pixels) was much higher than that of the thermal lens (160 by 120 pixels).The FLIR camera was not embedded with a GPS sensor and no GCPs were set in advance in the study area.

Instruments
We used a DJI Phantom 4 Pro Quadcopter (Figure 2a, Table 1) as our UAV platform.Its built-in RGB camera was not used; instead a FLIR DUO R dual-sensor RGB/thermal camera (Figure 2b, Table 2) was attached to the UAV to acquire photos of the landscape.During the flight over the target area (the green box in Figure 1), the FLIR camera took photos in the nadir direction, in its burst mode allowing consecutive capture of synchronized RGB and thermal images at one pair of photos per second.The RGB photos were saved in the 8-bit JPG format and the thermal images in the 16-bit TIFF format.The resolution of the visible lens (1920 by 1080 pixels) was much higher than that of the thermal lens (160 by 120 pixels).The FLIR camera was not embedded with a GPS sensor and no GCPs were set in advance in the study area.The flight route over Beaver Pond Park was set up using DJI GS PRO, the DJI App for mission planning.The flight mission was carried out at about 10:00 AM on 31 October 2018, a clear day.Air temperature and relative humidity at the time of the flight were about 8 • C and 60%, respectively.These atmospheric parameters and the sky condition were set to the camera prior to the flight in order to correct the effect of atmospheric absorption.The flying height was 85 m and the whole flight took about 10 minutes.Four swaths of consecutive photos were taken in the NW-SE direction in parallel to the long dimension of the rectangular target area shown in Figure 1, with the front and side overlaps of 85% and 75% for RGB photos and of 85% and 50% for thermal photos, respectively.A total of 440 pairs of RGB and thermal photos were selected after manually filtering out some blurry RGB photos.

The Workflow of Four-band Thermal Mosaicking
2.3.1.Pre-processing: Creating Four-band Images Figure 3 includes the workflow of image pre-processing.The thermal lens has a narrower Field of View (FOV = 57 • ) than the visible lens (FOV = 72 • ) in the horizontal directions, and has exactly the same FOV in vertical direction (44 • ).Because the RGB lens has a vertical resolution (1080) that is a multiple of that the thermal does (120), it is possible to up-sample the coarser thermal image by duplicating each of the original pixel to a 9 by 9 matrix of smaller pixels with identical Digital Number (DN) value.After the up-sampling, the thermal photo has the same number of rows as the RGB photo.The next step is cropping the visible photo to the same size as the thermal photo by removing an edge of 240 by 1080 pixels from each side.The side overlap was decreased to 50% because of cropping.After resizing, the thermal band can be stacked onto the visible band to create a four-band TIFF image with the sequence of red, green, blue, and thermal infrared.This batch process can be done in MATLAB (MATLAB and Statistics Toolbox Release 2017b, The MathWorks, Inc., Natick, Massachusetts, United States).
features are used as control points to register the UAV orthomosaics on the World Imagery base map in Esri's ArcMap.By selecting control points (a total of 8 in the present case) in the RGB orthomosaic, the thermal orthomosaic is georeferenced along the way (Figure 3).
The pixel DN value of the thermal band is proportional to the flux of infrared radiation received by the thermal lens.The DN value can be converted to temperature on the principle of the Stefan-Boltzmann Law [42].Before stacking, the RGB bands in the 8-bit JPG format are converted to the 16-bit "TIFF" format.This conversion is necessary because the DN value in the 8-bit range (0-255) is too small compared with the DN value in the 16-bit range (0-65535).However, a complete linear stretching from 255 to 65535 would sometimes lead to failure of tie point generation in the SfM workflow.Even if the workflow was able to complete, the final RGB othomoasic would be blurry.One possible reason for this is that linear stretching increases the minimum possible difference between DN values, and by doing so the contrast between similar features is overly magnified, making key-point detection harder.The end result is that more images are incorrectly registered.
A good solution is to stretch the DN value of the RGB bands by multiplying with a specific integer.After several trial-and-error attempts, we have chosen a multiplier of 10, bringing the range of the stretched DN values to a similar magnitude of the 16-bit range.We have found that if a larger multiplier is used, the processing time will become exceedingly long, but without obvious quality improvement.

Mosaicking and Post-processing
After importing the four-band images into Pix4Dmapper (version 4.3.31;https://www.pix4d.com),we need to allocate the band weight when editing the camera mode in the image properties editor (Table 3).The thermal band (IR) should have no weight, and via tests we determined that the green band should have the most weight.It is also important that the focal length be set to 34.8 mm, the focal length of the thermal lens.Since the four-band image has been cropped to the same FOV as the thermal image, the choice of this focal length is appropriate.This focal length has been confirmed via the EXIFTOOL [41].What follow next are the normal steps of the SfM algorithm: finding sparse point clouds, generating dense point clouds, building mesh and digital surface model (DSM), and producing orthomosaic.The whole SfM workflow in Pix4D took about 48 minutes.The four-band orthomosaic is later split to an RGB orthomosaic and a thermal orthomosaic via MATLAB.After splitting, the RGB band DN values are converted back to the 8-bit radiometric resolution, by dividing with the factor 10. Because of the rich texture of the RGB orthomosaic, distinct features are easily detected.These features are used as control points to register the UAV orthomosaics on the World Imagery base map in Esri's ArcMap.
By selecting control points (a total of 8 in the present case) in the RGB orthomosaic, the thermal orthomosaic is georeferenced along the way (Figure 3).
The pixel DN value of the thermal band is proportional to the flux of infrared radiation received by the thermal lens.The DN value can be converted to temperature on the principle of the Stefan-Boltzmann Law [42].

Image Orthomosaics
The resulting orthomosaics show that the surveyed area covers 12.18 hectares.The pixel size is 6.36 cm.Comparison of the RGB and the thermal orthomosaic in Figure 4 reveals strong contrasts between difference surfaces.For example, the pond water was relatively warm because the large specific heat capacity of liquid water helped it maintain high temperature in the morning.A garage with metal roof, identifiable on the RGB orthomosaic, showed an extremely low thermal signature (DN value of 2250) due to the low-emissivity of the roof.Trees were cooler than the water surface because of insufficient heating by solar radiation in the morning.A more detailed comparison of thermal properties is given in Section 3.5.

Image Orthomosaics
The resulting orthomosaics show that the surveyed area covers 12.18 hectares.The pixel size is 6.36 cm.Comparison of the RGB and the thermal orthomosaic in Figure 4 reveals strong contrasts between difference surfaces.For example, the pond water was relatively warm because the large specific heat capacity of liquid water helped it maintain high temperature in the morning.A garage with metal roof, identifiable on the RGB orthomosaic, showed an extremely low thermal signature (DN value of 2250) due to the low-emissivity of the roof.Trees were cooler than the water surface because of insufficient heating by solar radiation in the morning.A more detailed comparison of thermal properties is given in Section 3.4.
This preliminary result has already achieved our first goal: the FTM process is no more complicated than the standard RGB mosaicking in the SfM workflow.In the FTM process, the thermal mosaicking is simultaneously accomplished when the corresponding RGB images are stitched the classic way.No geographic information is used in this process.Instead, the orthomosaic is georeferenced using a simple ArcMap georeferencing tool as a post-mosaic step.As for our second goal, the thermal orthomosaic is generated from the same point cloud of the RGB orthomosaic, therefore having an identical sampled area.This preliminary result has already achieved our first goal: the FTM process is no more complicated than the standard RGB mosaicking in the SfM workflow.In the FTM process, the thermal mosaicking is simultaneously accomplished when the corresponding RGB images are stitched the classic way.No geographic information is used in this process.Instead, the orthomosaic is georeferenced using a simple ArcMap georeferencing tool as a post-mosaic step.As for our second goal, the thermal orthomosaic is generated from the same point cloud of the RGB orthomosaic, therefore having an identical sampled area.

Positional Error Assessment
Inter-band registration is a process that aligns the same features in difference bands accurately.Before the registration, it is important to understand the reasons for misalignment, or in other words, the sources of positional error.
One error type, the lens spacing error, is related to the separation between the visible lens and the thermal lens on the camera body (Figure 2b).This type of error can be determined accurately.Figure 5a explains how the objects at different heights in the camera FOV are projected to the background, which is the ground surface in the present study, to form a 2-D image.As for the object right on the ground surface (the green circle), the sight lines (green arrows) of the visible and the thermal lenses intersect on the ground, so the pixels of the object are projected to have the same geolocation in the RGB and the thermal images.However, the sight lines (red arrows) of an object above the (the red circle) intersect aloft, and the object projected in the thermal image occurs to the right of its projection in the RGB image.This misalignment will increase as the object being observed becomes higher above the ground.In Figure 5b, we have: where d is the lens spacing and d 1 is the misalignment between the two projections, D is the vertical distance between the on-board camera and the object being observed, and H is the height of the object above the ground.
One error type, the lens spacing error, is related to the separation between the visible lens and the thermal lens on the camera body (Figure 2b).This type of error can be determined accurately.Figure 5a explains how the objects at different heights in the camera FOV are projected to the background, which is the ground surface in the present study, to form a 2-D image.As for the object right on the ground surface (the green circle), the sight lines (green arrows) of the visible and the thermal lenses intersect on the ground, so the pixels of the object are projected to have the same geolocation in the RGB and the thermal images.However, the sight lines (red arrows) of an object above the ground (the red circle) intersect aloft, and the object projected in the thermal image occurs to the right of its projection in the RGB image.This misalignment will increase as the object being observed becomes higher above the ground.In Figure 5b, we have: where  is the lens spacing and  is the misalignment between the two projections,  is the vertical distance between the on-board camera and the object being observed, and  is the height of the object above the ground.
During the band stacking, we align the RGB and thermal images along their central lines instead of offseting the thermal images appropriately by the distance of lens spacing.Therefore, the lens spacing  itself also becomes part of the lens spacing error (Equation ( 2)), marked as  in Figure 5b, where and the total lens spacing error  is given by Equation ( 3): For our flight over the Beaver Pond Park, the tallest objects are the trees, which are no more than 15 m high.Given the  of 15 m, the flying height   85 m, and the lens seapration  of 2.8 cm, the lens spacing error  is about 3.4 cm, which is smaller than the size of one RGB pixel (6.36 cm).Therefore, no correction is made for this type of positional error.During the band stacking, we align the RGB and thermal images along their central lines instead of offseting the thermal images appropriately by the distance of lens spacing.Therefore, the lens spacing d itself also becomes part of the lens spacing error (Equation (2)), marked as d 2 in Figure 5b, where and the total lens spacing error E d is given by Equation ( 3): Remote Sens. 2019, 11, 1365 9 of 17 For our flight over the Beaver Pond Park, the tallest objects are the trees, which are no more than 15 m high.Given the H of 15 m, the flying height H + D = 85 m, and the lens seapration d of 2.8 cm, the lens spacing error E d is about 3.4 cm, which is smaller than the size of one RGB pixel (6.36 cm).Therefore, no correction is made for this type of positional error.
Another type of positional error is caused by the slight shutter delay between the visible and the thermal lenses (Supplementary Figure S2).This error depends on the moving direction and speed of the UAV, and in theory can be corrected for each swath and flight level.But such a swath-by-swath procedure is labor intensive.The better way of correction is to remove the error for the whole orthomosaic, as discussed below.

Quantifying and Correcting the Misalignment
We designed an object-based method to remove the positional error between the RGB and the thermal bands.Unlike other object-based algorithms [14,29], which register each of the individual photos prior to mosaicking, this method uses objects classified from the orthomosaic (Figure 6c).In total, 20 objects were analyzed.Two of the objects are shown in Figure 6a,b.Each object is characterized by a misalignment vector between the thermal band and the RGB bands.These 20 vectors are plotted in Figure 6d.In each case, the vector base is the object idenfied in the RGB mosaic and is anchored at the origin, and the vector tip is the object identified in the thermal mosaic.The vector plot helps us to determine the total isalignment, i.e., the length of the vector, as well as orientation of the misalignment.By using objects scattered across the whole region, we can determine whether different parts of the thermal orthomosaic have a generally consistent misalignment.Since these vectors have a similar length and orientation, it is acceptable to average them in order to obtain a "mean vector", whose two othognal compoments, in pixel number, are x = 28.20 and y = −5.55.The magnitude of the mean vector is 28.74, representing the mean positional error in pixel number or an actual distance of 182.79 cm.We then corrected the misalignment by offsetting the thermal orthomosaic 28 pixels leftward and six pixels upward.The corrected thermal orthomosaic should have a much reduced positional error relative to the RGB orthomosaic.
Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 17 Another type of positional error is caused by the slight shutter delay between the visible and the thermal lenses (Supplementary Figure S2).This error depends on the moving direction and speed of the UAV, and in theory can be corrected for each swath and flight level.But such a swath-by-swath procedure is labor intensive.The better way of correction is to remove the error for the whole orthomosaic, as discussed below.

Quantifying and Correcting the Misalignment
We designed an object-based method to remove the positional error between the RGB and the thermal bands.Unlike other object-based algorithms [14,29], which register each of the individual photos prior to mosaicking, this method uses objects classified from the orthomosaic (Figure 6c).In total, 20 objects were analyzed.Two of the objects are shown in Figure 6a and b.Each object is characterized by a misalignment vector between the thermal band and the RGB bands.These 20 vectors are plotted in Figure 6d.In each case, the vector base is the object idenfied in the RGB mosaic and is anchored at the origin, and the vector tip is the object identified in the thermal mosaic.The vector plot helps us to determine the total misalignent, i.e., the length of the vector, as well as orientation of the misalignment.By using objects scattered across the whole region, we can determine whether different parts of the thermal orthomosaic have a generally consistent misalignment.Since these vectors have a similar length and orientation, it is acceptable to average them in order to obtain a "mean vector", whose two othognal compoments, in pixel number, are x = 28.20 and y = -5.55.The magnitude of the mean vector is 28.74, representing the mean positional error in pixel number or an actual distance of 182.79 cm.We then corrected the misalignment by offsetting the thermal orthomosaic 28 pixels leftward and six pixels upward.The corrected thermal orthomosaic should have a much reduced positional error relative to the RGB orthomosaic.

Validating the Object-based Calibration
To validate our calibrating method, both visual comparison and quantitative analysis are necessary.Figure 7 is an example showing an area in the RGB mosaic with a portal in the center showing the linked view of the same area in the thermal mosaic, before (panel a) and after calibration

Validating the Object-Based Calibration
To validate our calibrating method, both visual comparison and quantitative analysis are necessary.Figure 7 is an example showing an area in the RGB mosaic with a portal in the center showing the linked view of the same area in the thermal mosaic, before (panel a) and after calibration (panel b).A segment of gravel road is clearly recognizable in the thermal mosaic because it had a relatively low temperature.Comparing the road boundary inside and outside of the portal, it is clear that the thermal mosaic is registered much better after calibration.The same conclusion is reached using transect analysis.Figure 7c,d are variations of the DN value in thermal, green and blue bands along the horizontal transects marked in Figure 7a,b, respectively.The alignment in Figure 7d is better than in Figure 7c, because two black lines marking the two edges of the road in the Blue and Thermal bands becomes more vertical.(The road edges are not recognizable in the Green band.)relatively low temperature.Comparing the road boundary inside and outside of the portal, it is clear that the thermal mosaic is registered much better after calibration.The same conclusion is reached using transect analysis.Figure 7c and 7d are variations of the DN value in thermal, green and blue bands along the horizontal transects marked in Figure 7a and 7b, respectively.The alignment in Figure 7d is better than in Figure 7c, because two black lines marking the two edges of the road in the Blue and Thermal bands becomes more vertical.(The road edges are not recognizable in the Green band.) The alignment accuracy is further quantified with additional transects.We identified 10 transects for each of the four directions, including horizontal, vertical, main-diagonal (top-left to bottom right), and anti-diagonal (top-right to bottom-left).(Figure 7 is an example of horizontal transect).Each transect passes through an object whose temperature is visibly different from its surrounding.The offset (in pixel numbers) between the edge of the object in the thermal mosaic and in the RGB mosaic is the residual positional error.The root mean square (RMS) error for each direction is given in Table 4.The overall mean error is 6.12 (38.92 cm), which is much better than the mean error before calibration (21.64, 137.63 cm).The RMS errors are in the range of 5.09 to 7.13.The largest misalignment is 7.13 pixels, or 45.35 cm in the physical dimension, which is smaller than 57.24 cm, the size of the original raw thermal pixel before up-sampling.Furthermore, the error patterns do not show any obvious directional dependence, supporting the use of a mean vector to align the two mosaics.The alignment accuracy is further quantified with additional transects.We identified 10 transects for each of the four directions, including horizontal, vertical, main-diagonal (top-left to bottom right), and anti-diagonal (top-right to bottom-left).(Figure 7 is an example of horizontal transect).Each transect passes through an object whose temperature is visibly different from its surrounding.The offset (in pixel numbers) between the edge of the object in the thermal mosaic and in the RGB mosaic is the residual positional error.The root mean square (RMS) error for each direction is given in Table 4.The overall mean error is 6.12 (38.92 cm), which is much better than the mean error before calibration (21.64, 137.63 cm).The RMS errors are in the range of 5.09 to 7.13.The largest misalignment is 7.13 pixels, or 45.35 cm in the physical dimension, which is smaller than 57.24 cm, the size of the original raw thermal pixel before up-sampling.Furthermore, the error patterns do not show any obvious directional dependence, supporting the use of a mean vector to align the two mosaics.

Radiometric Calibration
After the calibration procedure, the thermal orthomosaic has acceptable positional error.Thus we can inspect the temperature distribution of different objects directly using their positions in the RGB orthomosaic.Prior to this pixel-by-pixel analysis, all the DN values in the thermal band should be converted to actural temperature values.
The temperature conversion formula was obtained with measurements made simultaneously with an infrared thermometer (Figure 8a; working wavelengths 8-14 µm) and the thermal camera (Figure 2b; working wavelengths 7.5-13.5 µm).The emissivity of the infrared therometer was set to 0.95.The measurements were carried out at 14:00 PM on 9 January 2019, on 40 ground objects.The air temperature and the sky condition were close to those of the UAV flight.However, the relative humidity was much lower (~40%), requiring a proper adjustment of atmospheric parameters of the camera to correct atmospheric absorption.A matrix of 20 by 20 pixels centered on the object was used to obtain the mean DN value of the object.Linear regression was performed between the sampled temperature values and the corresponding DN values (Figure 8b).The regression equation was used to convert the thermal orthomosaic to a temperature map (Supplemetary Figure S1).

Radiometric Calibration
After the calibration procedure, the thermal orthomosaic has acceptable positional error.Thus we can inspect the temperature distribution of different objects directly using their positions in the RGB orthomosaic.Prior to this pixel-by-pixel analysis, all the DN values in the thermal band should be converted to actural temperature values.
The temperature conversion formula was obtained with measurements made simultaneously with an infrared thermometer (Figure 8a; working wavelengths 8-14 μm) and the thermal camera (Figure 2b; working wavelengths 7.5-13.5 μm).The emissivity of the infrared therometer was set to 0.95.The measurements were carried out at 14:00 PM on 9 January 2019, on 40 ground objects.The air temperature and the sky condition were close to those of the UAV flight.However, the relative humidity was much lower (~40%), requiring a proper adjustment of atmospheric parameters of the camera to correct atmospheric absorption.A matrix of 20 by 20 pixels centered on the object was used to obtain the mean DN value of the object.Linear regression was performed between the sampled temperature values and the corresponding DN values (Figure 8b).The regression equation was used to convert the thermal orthomosaic to a temperature map (Supplemetary Figure S1).

Cluster Analysis
We use a cluster analysis to evaluate the temperature map.The analysis is comprised of two steps, rule-based classification and construction of temperature histograms.
A rule-based classification was applied to the RGB orthomosaic using a MATLAB program.The rules were set only based on the RGB contrast.In all, 77 % of the pixels were grouped into 9 different information classes, and 23 % remained unclassified (Figure 9).Next, a histogram of temperature distribution was constructed for each class (Figure 10).The typical temperature range of difference surfaces can be determined from the histograms.The temperature contrasts can be examined by comparing the peaks of these histograms.As expected for this time of the day (~10:00 local time), tree pixels are the coolest.Grass pixels have narrower range than tree pixels, and the majority of grass and tree pixels have similar temperatures as indicated by the very close peak values.Soil is warmer than grass because it warms up faster in the morning under sunlight.Of the three types of aquatic plants, the "cyan" plants form a thin layer floating on the water, so their temperatures are close to the temperature of the water body.The "green" and "red" plants extend above the water, so their temperatures are more different from the water temperature.The temperatures of asphalt and sandy road are intermediate.Water is the warmest because of its large heat capacity.The cluster analysis confirms that the thermal orthomosaic has captured the expected temperature distributions across the landscape.We use a cluster analysis to evaluate the temperature map.The analysis is comprised of two steps, rule-based classification and construction of temperature histograms.
A rule-based classification was applied to the RGB orthomosaic using a MATLAB program.The rules were set only based on the RGB contrast.In all, 77 % of the pixels were grouped into 9 different information classes, and 23 % remained unclassified (Figure 9).Next, a histogram of temperature distribution was constructed for each class (Figure 10).The typical temperature range of difference surfaces can be determined from the histograms.The temperature contrasts can be examined by comparing the peaks of these histograms.As expected for this time of the day (~10:00 local time), tree pixels are the coolest.Grass pixels have narrower range than tree pixels, and the majority of grass and tree pixels have similar temperatures as indicated by the very close peak values.Soil is warmer than grass because it warms up faster in the morning under sunlight.Of the three types of aquatic plants, the "cyan" plants form a thin layer floating on the water, so their temperatures are close to the temperature of the water body.The "green" and "red" plants extend above the water, so their temperatures are more different from the water temperature.The temperatures of asphalt and sandy road are intermediate.Water is the warmest because of its large heat capacity.The cluster analysis confirms that the thermal orthomosaic has captured the expected temperature distributions across the landscape.Here, clusters "Green Tree" and "Yellow Tree" have been merged into one cluster called "Tree".

Discussion
This work aims to design a novel method, four-band thermal mosaicking (FTM), for mosaicking thermal imagery.The key to overcoming the difficulty associated with the coarse resolution and lack of single-band contrast of thermal images lies in processing all the images carefully prior to the mosaicking.Once the four-band images are generated, the workflow in Pix4D, an SfM-based application, is similar to mosaicking RGB images.No geographic information is needed in advance.The up-sampling process should be done with care.For the FLIR DUO PRO camera, the visible and thermal lenses have the same vertical FOV.That the visible band resolution is a multiple of the thermal-band resolution greatly simplifies the up-sampling process.For other dual cameras types without consistent FOVs in any direction or multiple relationship between resolutions, both RGB and thermal images need to be cropped to have matched FOVs and then up-sampled to the resolution of their minimum common multiple.Toward that end, more tests are needed for other camera types (including self-designed dual-lens cameras).Because cropping will decrease image size, feature matching may not be as precise as using the uncropped RGB images.
In the present study, the thermal and RGB orthomosaics have identical sampled areas.Two types of positional error are analyzed.The first type, the lens spacing error, is negligible (less than 6.36 cm).Therefore, we only need to calibrate the whole orthomosaic in order to remove the second type of error -misalignment caused by shutter delay between the two lenses.It is worth noting that object-based calibration is only feasible if the vectors of misalignment of the selected calibration objects are similar, or if the thermal orthomosaic deviates from the RGB orthomosaic by a uniform offset.Erratic error vectors are likely if the UVA flight pattern has no dominant direction of motion.Under this circumstance, the misalignment needs to be fixed for each single swath.In the present case, a simple offset correction to the thermal orthomosaic has produced acceptable results, as shown by the transect analysis.Compared to some other studies [36] (p.4016), our positional error is higher in actual distance (39 cm), but is within one original thermal pixel size (57 cm).
The error analysis is based on the ground features found in the final orthomosaics.The overall performance is essentially limited by the resolution of the thermal lens: features smaller than one original thermal pixel are not identifiable in the thermal orthomosaic.The primary goal of the objectbased calibration in this work is just to reduce the positional error to less than the original thermal pixel size (57 cm), which we have achieved.Further improvement of the performance requires the use of higher-resolution thermal cameras with a smaller raw pixel size.Here, clusters "Green Tree" and "Yellow Tree" have been merged into one cluster called "Tree".

Discussion
This work aims to design a novel method, four-band thermal mosaicking (FTM), for mosaicking thermal imagery.The key to overcoming the difficulty associated with the coarse resolution and lack of single-band contrast of thermal images lies in processing all the images carefully prior to the mosaicking.Once the four-band images are generated, the workflow in Pix4D, an SfM-based application, is similar to mosaicking RGB images.No geographic information is needed in advance.The up-sampling process should be done with care.For the FLIR DUO PRO camera, the visible and thermal lenses have the same vertical FOV.That the visible band resolution is a multiple of the thermal-band resolution greatly simplifies the up-sampling process.For other dual cameras types without consistent FOVs in any direction or multiple relationship between resolutions, both RGB and thermal images need to be cropped to have matched FOVs and then up-sampled to the resolution of their minimum common multiple.Toward that end, more tests are needed for other camera types (including self-designed dual-lens cameras).Because cropping will decrease image size, feature matching may not be as precise as using the uncropped RGB images.
In the present study, the thermal and RGB orthomosaics have identical sampled areas.Two types of positional error are analyzed.The first type, the lens spacing error, is negligible (less than 6.36 cm).Therefore, we only need to calibrate the whole orthomosaic in order to remove the second type of error-misalignment caused by shutter delay between the two lenses.It is worth noting that object-based calibration is only feasible if the vectors of misalignment of the selected calibration objects are similar, or if the thermal orthomosaic deviates from the RGB orthomosaic by a uniform offset.Erratic error vectors are likely if the UVA flight pattern has no dominant direction of motion.Under this circumstance, the misalignment needs to be fixed for each single swath.In the present case, a simple offset correction to the thermal orthomosaic has produced acceptable results, as shown by the transect analysis.Compared to some other studies [36] (p.4016), our positional error is higher in actual distance (39 cm), but is within one original thermal pixel size (57 cm).
The error analysis is based on the ground features found in the final orthomosaics.The overall performance is essentially limited by the resolution of the thermal lens: features smaller than one original thermal pixel are not identifiable in the thermal orthomosaic.The primary goal of the object-based calibration in this work is just to reduce the positional error to less than the original thermal pixel size (57 cm), which we have achieved.Further improvement of the performance requires the use of higher-resolution thermal cameras with a smaller raw pixel size.
In theory, UAV images can also be processed in a continuous domain like HSV (hue-saturation-value) since the HSV domain separates the information of color (hue) and intensity (value) from the RGB composite.It is also possible that the contrast between adjacent objects becomes sharper after the conversion from RGB to HSV.We tested this idea by converting the RGB photos to HSV images and then scaling the HSV values to a 16-bit format.We then processed the images three times using Pix4D with three different band allocations: (1) main band-weight on hue, (2) main band-weight on saturation, and (3) main band-weight on value.We found that the orthomosaic is more successful with the main band-weight allocated to value.This HSV-based orthomosaic is overall consistent to the original RGB-based orthomosaic (Supplementary Figure S3).However, the two orthomosaics differ in detail.In many places, the HSV-based orthomosaic shows serious errors in mapping the true colors (Supplementary Figure S4).The erratic color distribution would lead to a poor registration of the thermal band at these places.For this reason, we have decided to use the original RGB-mosaicking method.
Our temperature map has been evaluated by cluster analysis.However, our temperature measurement assumed a fixed value of emissivity.This may explain why some low-emissivity objects, such as the metal roof of a garage (Figure 4) has a very low temperature (−6.7 • C, Supplementary Figure S1).In the work of Turner et al. [36], no unexpected temperature values are evident even though a similar measurement method was used.This is because the surface of our study area is more heterogeneous as opposed to the Moss Bed in their study.Nevertheless, the primary purpose of radiometric calibration is to convert pixel DN values to values on an actual temperature scale.The cluster analysis is not meant to indicate that the actual temperature had been correctly retrieved for all the objects in the scene.Instead, these temperature values are used to determine if the thermal mosaic has captured the expected contrast of thermal properties across the landscape.
Although not helpful to positional accuracy, the correction of atmospheric interference may bring improvements to surface temperature estimation.MODTRAN (radiative transfer modeling for atmospheric correction) is regarded as a reliable model to correct the impact of atmospheric absorption of radiation [43].Berni et al. [44] applied MODTRAN to their UAV photogrammetry and found that the error due to atmospheric absorption is dependent on surface temperature and is lower for cooler objects.At the maximum temperature value of 15 • C in our landscape, we infer from their data that the error in our study should be no more than 1 • C. The actual error should be lower than this because our UAV flight was carried out in late fall at a lower atmospheric moisture and at a lower flight height than theirs.

Conclusions
In this study, a novel method, termed four-band thermal mosaicking (FTM), was developed to process thermal images from unmanned aerial vehicles (UAVs).The method requires the synchronized capture of RGB and thermal images using a dual-lens camera.The thermal images are subsequently resized and stacked onto their corresponding RGB photos in order to create four-band images.Afterwards, the four-band images are mosaicked via an SfM workflow using only the RGB bands.The method overcomes the problem of coarse resolution and low contrast in the thermal band in thermal mosaicking.The four-band orthomosaic can then be separated into an RGB and a thermal mosaic for further analysis.
The FTM was applied successfully to images acquired in a flight mission over an urban landscape.To improve the result, we developed an object-based calibration method to correct the positional error between the RGB orthomosaic and the thermal orthomosaic.In the current study, the error arose mainly from shutter delay between the lenses.The calibration reduced the mean positional error from 138 cm to 39 cm, which is less than one thermal pixel size (57 cm).
The FTM workflow opens opportunities for low-cost but high-resolution UAV observation of the thermal environment.It has scientific merits in many fields of application such as wildlife survey [45,46], wildfire detection [1,2], and urban climate studies [47], all of which require detailed thermal data.Future work is needed to test the method with different camera configurations.With the help of higher-resolution thermal cameras, further reduction of inter-band positional error may be possible.Additional improvements may also result from more rigorous correction for atmospheric interference and the use of ground targets of known temperatures in order to generate the actual temperature map.

Figure 1 .
Figure 1.The satellite map of Beaver Pond Park, New Haven, Connecticut, United States, with an inset map indicating its location.The green box indicates the approximate area covered by the UAV observation.

Figure 1 .
Figure 1.The satellite map of Beaver Pond Park, New Haven, Connecticut, United States, with an inset map indicating its location.The green box indicates the approximate area covered by the UAV observation.

Figure 2 .
Figure 2. Photos of instruments: (a) is DJI Phantom 4 Quadcopter and (b) is the FLIR DUO R dualsensor thermal camera.

Figure 2 .
Figure 2. Photos of instruments: (a) is DJI Phantom 4 Quadcopter and (b) is the FLIR DUO R dual-sensor thermal camera.

Figure 3 .
Figure 3.The workflow of pre-processing the UAV images, mosaicking, and post-processing the orthomosaics.A sample of office scene is used to demonstrate how the thermal image (in gray scale) is up-sampled and then stacked onto its corresponding RGB image, the latter of which have been cropped and radiometrically stretched.The four-band images are mosaicked by SfM workflow in PIX4D.The generated four-band orthomosaic is georeferenced by Esri's Arcmap.The labels in red indicate the software required in the workflow.

Figure 3 .
Figure 3.The workflow of pre-processing the UAV images, mosaicking, and post-processing the orthomosaics.A sample of office scene is used to demonstrate how the thermal image (in gray scale) is up-sampled and then stacked onto its corresponding RGB image, the latter of which have been cropped and radiometrically stretched.The four-band images are mosaicked by SfM workflow in PIX4D.The generated four-band orthomosaic is georeferenced by Esri's Arcmap.The labels in red indicate the software required in the workflow.

Figure 4 .
Figure 4.The georeferenced (a) RGB and (b) thermal orthomosaics of Beaver Pond Park.In the right panel, the legend is the main range of DN values of the thermal band.The red circle on the thermal orthomosaic indicates a black spot with an extremely low DN value (2250) due to the low-emissivity of a metal roof on a garage (identifiable on the RGB).

Figure 4 .
Figure 4.The georeferenced (a) RGB and (b) thermal orthomosaics of Beaver Pond Park.In the right panel, the legend is the main range of DN values of the thermal band.The red circle on the thermal orthomosaic indicates a black spot with an extremely low DN value (2250) due to the low-emissivity of a metal roof on a garage (identifiable on the RGB).

Figure 5 .
Figure 5.The diagram showing the lens spacing error.In (a), the solid triangle and the dashed triangle represent the FOV of the visible and the thermal lens, respectively.The thermal lens FOV is narrower and is to the left of the visible lens FOV when the camera is in forward motion.For the object located on the ground (green cycle), the sight lines of the visible lens (solid green arrow) and the thermal lens (dashed green arrow) both point to the object on the ground (green circle), and for the object above the ground (red circle), the intersection point of the sight lines (solid red and dashed red arrows) is elevated.In (b), the solid and the dash arrows are the sight lines of the visible and the thermal lenses, respectively, where  is lens spacing,  is distance between the on-board camera and the object being observed, H is the height of the object above the ground, and   and   refer to two different contributors of Lens Spacing Error.

Figure 5 .
Figure 5.The diagram showing the lens spacing error.In (a), the solid triangle and the dashed triangle represent the FOV of the visible and the thermal lens, respectively.The thermal lens FOV is narrower and is to the left of the visible lens FOV when the camera is in forward motion.For the object located on the ground (green cycle), the sight lines of the visible lens (solid green arrow) and the thermal lens (dashed green arrow) both point to the object on the ground (green circle), and for the object above the ground (red circle), the intersection point of the sight lines (solid red and dashed red arrows) is elevated.In (b), the solid and the dash arrows are the sight lines of the visible and the thermal lenses, respectively, where d is lens spacing, D is distance between the on-board camera and the object being observed, H is the height of the object above the ground, and d 1 and d 2 refer to two different contributors of Lens Spacing Error.

Figure 6 .
Figure 6.Removal of misalignment between the RGB and the thermal orthomosaics.(a,b): two typical views clipped from the orthomosaics, with two recognizable objects [a hot car engine in (a) and a cold red tree in (b)].The left panel shows the RGB view and the right panel shows the thermal view.The black arrow is the vector of misalignment.(d): 20 misalignment vectors (black arrows) sampled from (c) and the mean misalignment vector (red arrow).The two red numbers are components of the mean vector.

Figure 6 .
Figure 6.Removal of misalignment between the RGB and the thermal orthomosaics.(a,b): two typical views clipped from the orthomosaics, with two recognizable objects [a hot car engine in (a) and a cold red tree in (b)].The left panel shows the RGB view and the right panel shows the thermal view.The black arrow is the vector of misalignment.(d): 20 misalignment vectors (black arrows) sampled from (c) and the mean misalignment vector (red arrow).The two red numbers are components of the mean vector.

Figure 7 .
Figure 7.An example showing removal of image misalignment.(a,b): clipped views of the same area before and after calibration, respectively.(c,d): band profiles along the transects marked in (a) and (b), with the Infrared band in red corresponding with the right y-axis, the Green band in green, and the Blue band in blue, both corresponding with the left y-axis.The black lines mark the road edges.

Figure 7 .
Figure 7.An example showing removal of image misalignment.(a,b): clipped views of the same area before and after calibration, respectively.(c,d): band profiles along the transects marked in (a) and (b), with the Infrared band in red corresponding with the right y-axis, the Green band in green, and the Blue band in blue, both corresponding with the left y-axis.The black lines mark the road edges.

Figure 8 .
Figure 8. (a): Infrared thermometer used to measure temperature of ground objects; (b): Linear regression between the sampled temperature of 40 objects and their thermal DN values.Also shown are the regression equation, coefficient of determination (R 2 ), confidence level (p), the confidence interval of the predictions (blue dashed lines), and the standard errors of coefficients (SE). .

Figure 8 .
Figure 8. (a): Infrared thermometer used to measure temperature of ground objects; (b): Linear regression between the sampled temperature of 40 objects and their thermal DN values.Also shown are the regression equation, coefficient of determination (R 2 ), confidence level (p), the confidence interval of the predictions (blue dashed lines), and the standard errors of coefficients (SE).

Figure 9 .
Figure 9. Rule-based classification of the RGB orthomosaic.The coordinate axes are number of the pixels.

Figure 9 .
Figure 9. Rule-based classification of the RGB orthomosaic.The coordinate axes are number of the pixels.

Figure 10 .
Figure 10.The temperature histograms of the clusters showing in Figure 9. PN is the pixel number.Here, clusters "Green Tree" and "Yellow Tree" have been merged into one cluster called "Tree".

Figure 10 .
Figure 10.The temperature histograms of the clusters showing in Figure 9. PN is the pixel number.Here, clusters "Green Tree" and "Yellow Tree" have been merged into one cluster called "Tree".

Table 2 .
Technical specifications of FLIR DUO R dual-sensor imager.

Table 3 .
The band weight allocation in PIX4D Image Properties Editor.

Table 4 .
Misalignment error in pixel number in four directions after calibration.

Table 4 .
Misalignment error in pixel number in four directions after calibration.