2. Related Work
Digital cameras find many applications in remote sensing. However, the spectral information contained in their imagery is typically limited to only three bands (red, green and blue) for colour images. Multispectral cameras partially overcome this limitation, but these sensors are also restricted in the number of frequency bands they make available per frame. In contrast, hyperspectral cameras provide potentially hundreds of spectral bands per pixel. The use of these cameras has therefore significantly increased in recent years. This is mainly due to the availability of more cost-effective sensors and versatile platforms [
11]. Systems are also improving in terms of accuracy, efficiency, signal-to-noise ratio and ease of use; and of these systems, pushbroom sensors are the most widely available. In part, this is because pushbroom sensors do not require a complex mechanical scanning mechanism. Those instruments that do employ scanning mechanisms are generally known as whisk-broom sensors [
12,
17] and suffer from increased weight, volume and cost.
Pushbroom sensors [
18,
19,
20,
21,
22,
23,
24] offer a high spectral and spatial resolution by sampling individual spectral lines. However, such instruments are effectively line scanners, with their spatial accuracy highly dependent on flying conditions and the stability provided by any host platform and gimbal. The resulting errors are thus constrained by the accuracy of the navigation sensors, which are required to compensate for image distortion that often results from the geometric noise induced by a UAV’s motion [
4,
11,
25].
In an attempt to address this, several researchers have investigated methods that make use of multiple pushbroom sensors [
26,
27,
28]. Similarly, lower spectral and spatial resolution multi-spectral cameras also offer a valuable alternative [
11,
29,
30,
31,
32,
33]. These cameras collect band-sequential spectra in two spatial dimensions or by integrating multiple synchronised cameras. In both cases, the mosaicking process is able to make use of photogrammetric techniques such as structure from motion (SfM) [
34,
35,
36,
37], which provides spatial accuracy in their image products. This makes them well-suited to campaigns in real-world environments. As a result, the routine use of hyperspectral pushbroom systems remains challenging, in part due to the lack of automation and processing options available [
38].
Notwithstanding, several researchers have developed techniques for classifying materials and artefacts [
39,
40,
41,
42] using hyperspectral imagery. However, despite the radiometric accuracy of these techniques, the resulting images typically display considerable geometric deformation, making their combination with other data sources difficult.
High-precision stitching with small overlaps is important for many applications; and accurately overlapping swaths typically requires a large number of GCPs to avoid further distortion in the final mosaic [
11]. Some researchers have attempted ortho-rectification based on Global Navigation Satellite System/Inertial Navigation System (GNSS/IMU) parameters, a digital elevation model (DEM) and the PARGE software [
43]. Others have used a dense network of GCPs in addition to PARGE [
22,
44]. The studies report that a 5 cm root-mean-square error (RMSE) was achieved using hyperspectral imagery of a 2–4 cm ground resolution distance (GRD). However, the field deployment of GCPs is time-consuming and the PARGE software costs up to AUD 8500 for a node-locked licence.
Traditionally, additional time-consuming effort is also required during the post-processing stages to overcome the geometric distortions introduced though any UAV movement during flight. To address this problem Angel et al. [
8] developed a computationally robust automated georectification and mosaicking methodology that operates in a parallel computing environment running on a high-end Intel Xeon E5-2680 v2 processor (20 cores running at 2.8 GHz with 200 GB of RAM). They evaluated their results against several high-spatial-resolution (mm to cm) data sets collected using a pushbroom hyperspectral sensor and an associated RGB frame-based camera. The technique estimates the luminance of the hyperspectral swaths and co-registers these against an RGB-based orthomosaic using the speeded-up robust features (SURF) algorithm to identify common features between each swath (derived during a pre-processing stage) and the RGB-based orthomosaic. The maximum likelihood estimator sample consensus (MLESAC) approach then finds the best geometric transformation model to the retrieved feature matches. Individual scanlines are thus geometrically transformed and merged into a single spatially continuous mosaic using only a small number of GCPs. The technique offers processing speed improvements for a co-registration of 85% relative to the traditional manual/user-generated procedures.
Although the results provided by this method are excellent, and it is a great advance over previous approaches, it should be noted that considerable computational power and several hours were needed to process the data. As a result, the significant resources demanded may preclude its broader use. Moreover, the raw hyperspectral swaths used in the feature matching steps were generated during an initial pre-processing stage rather than during the (timed) MATLAB workflow described in the paper. The researchers also concluded that different illumination conditions and the use of different sensors (RGB vs. hyperspectral), resulting in distinct geometries, influence the number of features detected in each data set, and thus the georectification performance.
In regard to the creation of 3D point clouds, light detection and ranging (LiDAR) has been integrated with the hyperspectral sensors [
45,
46]. Such combinations allow both data sets to be synchronised, mitigating UAV platform positioning errors with respect to sensor resolution and any platform instability in flight. They also allow the hyperspectral characterisation of solid objects. Despite promising results, LiDAR sensors, even those considered low-cost, are expensive and require UAVs with a higher payload capacity. Consequently, such integrations are not yet readily accessible to everyone.
As a result, a number of researchers have developed other approaches to obtain models with information fed from different sensors. For instance, ref. [
47] developed a technique that applies SfM to images from different wavelengths and a 3D registration method to combine band-level models into a single 3D model. Similarly, ref. [
48] describes an out-of-core method that enables the on-site processing and visualization of data by using a collaborative 3D virtual environment. Other works have merged hyperspectral data with 3D models generated from standard RGB cameras or laser scanning to model geological structures, map underwater environments and carry out plant phenotyping [
49,
50,
51].
The processes needed to deal with the large quantities of 3D data and hyperspectral images are usually computationally demanding, and [
52] describes an approach based on the efficient generation of hyperspectral point clouds with a low-memory footprint. The procedure is run on a GPU using OpenGL’s compute shaders to build and render the 3D hyperspectral point cloud.
In view of the above, and in line with the goals espoused by Angel et al., the aim of this research was “to develop an automated workflow capable of producing accurate georectified UAV-based hyperspectral mosaics collected by pushbroom sensors”, and to extend the workflow of [
8] by creating 3D hyperspectral point clouds. In addition to adding geometric features to any image analysis, the generation of a 3D model that is enriched with hyperspectral data helps in the study of the distribution of spectral information relative to surface orientations under real-world conditions.
Thus, the aim of this research is to minimise human intervention during the geoprocessing stage, avoid reliance on GCP and fully automate an efficient “batch processing” co-registration, mosaicking and point cloud generation strategy. The goal was also to reduce the computational requirements of any technique such that the algorithms could run rapidly on readily available computing equipment.
The methodology proposed here improves a technique that provides a solution for one of the most time-consuming post-processing stages of UAV-based hyperspectral remote sensing using pushbroom sensors. It implements a simplified co-registration and mosaicking strategy and generates positionally accurate 3D hyperspectral point clouds; and it does so in a computationally efficient manner.
4. Methods
The automated georectification, mosaicking and 3D DPC generation workflow described in this paper is shown in
Figure 3. It was fully coded in MATLAB 2023a and ran on a Dell laptop with 32 GB RAM and Intel(R) Core i7-8850H CPU running at 2.60 GHz. The following sections describe the workflow.
Generation of RGB Orthomosaic and Dense Point Cloud: The sequence of Phantom IV images observed at each site were converted into a 3D dense point cloud (DPC) and an orthomosaic image using Agisoft’s MetaShape v1.8.3 software (
www.agisoft.com). MetaShape uses SfM, which consists of two main steps: camera motion estimation and reconstruction of the DPC. Initially, a sparse set of points are matched across the acquired 2D image stack to determine correspondences. Such features are typically extracted using algorithms like SIFT (scale invariant feature transformation) [
54] and SURF (speeded-up robust features) [
55]. The end result is a collection of
points (where
is typically of the order of 10
8), represented by the spatio-spectral function,
, for
.
MetaShape offers varying levels of point cloud (and hence orthomosaic) resolution: ultra-high, high, medium and low. The point clouds used in this study were generated using “high” levels of resolution, which typically represent a text file size of around 8–10 GB and a DPC density of about 3–4 cm (about 108 points). The TIF images, created as orthomosaics from the DPC with 2 cm GRD, typically have file sizes of about 2–3 GB. MetaShape uses a bundle adjustment to initially align the images (40,000 key points, 4000 tie points).
Although the GRD of the orthomosaic RGB image and hyperspectral data were around 1–2 cm, both were resampled to a common resolution of 5 cm. Bi-linear interpolation was used, with the value of the output pixel estimated from an average of its four closest neighbours. This ensured the processing would run on a 32 GB RAM computer. At least 64 GB RAM was required to run the data for the 15 Ha plots at a higher 2.5 cm GRD. In comparison, ref. [
8] made use of ultra-high levels of resolution, and while the tomato and date palm plots mosaicked were smaller, the observed resolutions were higher (the UAV flights were at a lower altitude in [
8]). Overall, however, the number of pixels generated per georectified mosaic in both studies were of a similar order, i.e., around 10
8 pixels/mosaic and 10
6 pixels/hyperspectral image. The number of UAV flight lines were also similar (11–15 flight lines per site in this study).
Raw Hyperspectral Data Processing: Pushbroom sensors deliver raw pixel data in the form of rows and columns. In other words, the raw data have no pre-associated geographic frame of reference. Such images must first be corrected using a process known as georectification to remove the geometric and location distortions. During this process, pixels are first transformed to a common plane to correct for geometric distortions. They are then georeferenced, i.e., real-world geodetic coordinates are assigned to each pixel in the image.
This initial correction was performed using SpectrononPro3 v3.4.4 software (
www.resonon.com), which can be operated in batch mode such that it outputs individual TIF images and/or data cubes, both corrected for distortion induced by the camera’s 3D motion. Unfortunately, GNSS-IMU sensor errors lead to geometric errors in the pre-processed output cubes/images, which is why additional processing is needed. Errors also arise from other sources, including mechanical and optical imperfections in the sensor, atmospheric effects and imperfections in the digital terrain (or flat earth) model. Importantly, SpectrononPro3 includes the WS84 coordinates of each corner of the image/cube (henceforth referred to only as cubes). As with the distortion corrections, however, these coordinates are contaminated with errors.
TIF images are generated from the corrected cube output comprising the three-colour RGB combination of the spectral bands (observed by the Pika-L sensor) closest to 670 nm (red), 540 nm (green) and 480 nm (blue). Although radiance and reflectance corrections can be applied, in this study they were not. As per [
8], these individual images can then be mosaicked into swaths and processed as per the rest of this workflow, skipping the next two steps. However, in order to accord with [
8]’s workflow, additional user effort is required to break contiguous raw data sequences—sets of consecutive cubes/files—into file sets commensurate with each swath; and even more effort is needed if the pitch, roll, yaw and timing offsets need to be optimised. To increase the degree of automation, we therefore introduced two simple steps: flight path analysis and image rejection and repair.
Flight Path Analysis: The flight path of the Matrice 600 UAV is analysed, and the corners of the lawnmower path automatically extracted. As the overall flight path of the UAV from take-off to landing can involve a considerable number of manoeuvres beyond those involved in the standard lawnmower pattern (e.g., to/from the lawnmower start/end points, rotations to orient the UAV along the line of flight, etc.), and there is a need for the approach to be robust to real-world data sets, this method is worthy of brief description.
First, the direction of travel, , at each epoch, , is computed over the entirety of the flight; being the epoch at which the first image is observed by the UAV and the last. A histogram of is formed and the bin with the most samples, , is deemed the primary direction of travel. The primary direction is referred to as , and direction numbers, , are assigned to each epoch for which satisfies the inequality, , where is a threshold (typically 10°) and used to compute the four primary directions/quadrants of travel for a lawnmower pattern. is used to represent epochs during which the UAV is not travelling in any of the four primary directions. Thus, each epoch is assigned a parameter, , to indicate whether it is travelling (say) north, south, east or west, or none of these directions.
A set of change points, , are then computed by finding the epochs for which there is a change in the direction number, . Change points too close together, geographically or temporally, are excluded, as are those that form too straight a path to both their neighbours at epochs and . Path lengths between the remaining change points (ostensibly corners of the lawnmower) are then computed and histograms are again used to find the most likely path lengths, which are assigned long or short based on their relative sizes. Simple checks are then applied to ensure the paths form a long–short–long sequence, start and end on a long path length, etc. This ensures any flight manoeuvres that precede the sequence of swaths used to overfly the site are not included in the processing (unless desired).
Using the centroids of the GNSS coordinates (
Figure 4),
,
,
, which represent the corners of each hyperspectral image/cube,
, all of the cubes/images that align with each leg (long and short) of the UAV flight are then automatically grouped together into
potential swaths. That is, images are combined and processed together in
groups,
, where
is the number of images in the
swath (noting, despite the lawnmower flight path, due to the asymmetry of images during turns, each swath does not necessarily contain an identical number of images).
The path is then identified from the image sequence using the assumption that each image obtained is file stamped by the hyperspectral sensor in monotonic order. That is, because the UAV often meanders before or after flying its designated (lawnmower) path or starts flying its intended path only to be tasked to re-start part-way through (so images observed during irregular flight phases are associated incorrectly (out of sequence) with the lawnmower pattern phase), the sum of the swaths, , is minimised. For speed, this is achieved using the number sequence of the image files.
Image Rejection and Repair: Generally, if a UAV is travelling at a constant forward velocity and environmental factors such as turbulence have little impact, the data cubes present as roughly rectangular entities. However, if the UAV rotates quickly at the end or start of a swath or environmental factors cause the UAV to accelerate for some other reason and the hyperspectral camera’s navigation sensors do not update quickly enough, the raw georectification processing can introduce distortions into the imagery. These distortions typically manifest as non-rectangularly shaped images (bulges or extensions on one or more of the image sides) and/or gaps or indentations in the continuity of the imagery, usually on the sides of the images.
Repair: Before image registration commences, each cube is morphologically and geometrically checked for integrity. First, a binary mask of the RGB image is created and the properties of all eight-connected structures in the image computed. An eight-connected structure is an object whose pixels are all connected to one another such that each pixel is surrounded by 8 others. If the image contains more than one structure (i.e., the image has been broken into multiple components) or the number of pixels in the binary mask falls below 99.9% of what should occupy a convex polygon described by the mask’s boundary, an attempt to repair the image is made: bilinear interpolation is applied to “fill” the empty pixels before the morphological and geometric tests are applied.
Rejection: A closing operation is then applied to the binary mask, which removes any small gaps in the data (closing involves dilation followed by erosion, where dilation enlarges or thickens the structure foreground object and erosion thins it). Finally, the perimeter and area of each contiguous region of the image, and , respectively, are computed. The boundary points of the largest structure and the four extrema of this boundary are identified, and a polygon is fitted to these.
The perimeter and area of the polygon,
and
, respectively, are then also computed. Any images with ratios of
or
are considered insufficiently rectangular to be usefully integrated into a swath composite. Before they are rejected, however, another geometric test is applied. The angle,
(where
), that describes the slope of each side of the polygon is computed as modulo 90°, and the values
…
are computed (
Figure 4). If 3 or more such values fall below a threshold,
(typically 10°), the polygon is considered to contain two parallel sides and one orthogonal one and is thus retained. Any sequence of images containing only one image is also rejected.
Luminance Normalisation: The luminance of each pixel,
and
, respectively, is retrieved from both the coloured RGB orthomosaic and hyperspectral images using the standard formula defined by the National Television System Committee (NTSC) [
56]. This retains the luminance but discards the hue and saturation information in the images. Unlike [
8], however, the luminance is first modified using
, where
for
, i.e.,
and
are the mean values of luminance of the non-zero pixels in the orthomosaic and hyperspectral images.
The white balance of the pixels is also modified in a similar way. That is, each RGB channel is corrected individually such that the RGB value of each pixel, , where represents the mean values of red, green or blue non-zero pixels, respectively, in the hyperspectral image. The modified values of are then concatenated into a three-colour RGB image. As the RGB values of each hyperspectral image output by SpectrononPro3 are maximally scaled within the range 0–255, the luminance of the orthomosaic pixels, , are similarly rescaled such that their minimum and maximum values fall between 0 and 255. This accounts for (partial) variation in lighting conditions that can occur between the RGB and hyperspectral sensor flights.
Creation of Composite Swath Images: In [
8], the hyperspectral swaths are created during the pre-processing stage of the raw data. This can require considerable user effort as the start and end images of each swath must be identified and any corrupted hyperspectral imagery excluded from the workflow. The individual swaths must then be mosaicked and named (often sequentially) for subsequent processing.
To reduce user effort and avoid computationally intensive convolution operations that approximately locate the swaths within the orthomosaic, we create a 3D matrix, of dimensions , where , and ; and represent the extrema of the orthomosaic image and GRD the desired ground resolution distance at which the image matching is to be conducted (0.05 m in this study). The third dimension of the matrix, , has one channel reserved for each band of the RGB triplicate.
Commencing at the first image of the first swath,
, each of the
(
Figure 5) hyperspectral images are processed in turn. First, the extrema,
, are computed from
, the coordinates of the corners of the hyperspectral image. Then, the image is resampled into a matrix of dimensions,
, where
,
and
. Next, this resampled matrix is allocated to a sub-set of pixels,
, where
,
,
and
, within an (initially blank) matrix of dimensions
. Although this is an inaccurate placement, not least because the corners of the hyperspectral images
do not form a rectangle, this is performed because it is computationally much faster to allocate a rectangular
×
×
sub-block of pixels in memory than to accurately assign, pixel-by-pixel, a slightly tilted, trapezoidal image to the correct elements within the relevant matrix needed to form part of the composite image representing each swath.
However, it also means the image formed by the × × sub-block of pixels, , is slightly distorted relative to its true orientation. is therefore transformed, using an affine transformation, from its position described by the coordinates, , , , to its correct orientation, , defined by the corner coordinates of the image, . Executing a transform of at this stage, and then combining the transformed images, , into swaths rather than attempting a block transform to its transformed version, , (to account for both the georectification errors in the swath and the orientation errors in the individual images), delivers an improved performance. This is because the global transformation applied to is typically unable to account for the individual (small) adjustments in orientation required by each of the individual images, .
For hyperspectral images , it is assumed that the swaths were formed as a result of a (more-or-less) constant velocity forward flight by the UAV. In other words, it is assumed that each image is a contiguous extension of the previous one, and that swaths can be reconstituted as a single continuous image from their component images. Consequently, after the creation of , the closest corners to the previous one are identified and differenced, and the offsets, , added to corners of the relevant coordinates of subsequent images. This enables the small errors in the GNSS coordinates recorded by the pushbroom sensor to be aligned such that the swaths form a single continuous image.
Approximate Location of Swath: As is now located reasonably accurately within the pixel index framework of the orthomosaic, extensive computational effort is not required to find matching features between the two data sets. A logical mask, , of is created and a large dilation operation conducted. A mask size of 5–10 m (converted to pixels by dividing by GRD) typically proved sufficient. The resulting (larger) mask, , is now larger than and , but still covers the correct location, whilst also accounting for GNSS-IMU navigation sensor errors between the (potentially different) UAVs carrying the RGB and hyperspectral pushbroom sensors. The mask is then applied to the whole orthomosaic, such that the resulting partial orthomosaic, , and hyperspectral swath, , can be accurately georectified in a manner that maximises computational efficiency. In other words, the prospect of matching features within with those outside its equivalent section of the orthomosaic is minimised as the two images contain (almost) all of the same features (edges, corners, blobs, ridges, etc.).
Extraction of Matching MSER and SURF Points: Maximally stable extremal regions (MSER) [
57] and speeded-up robust feature (SURF) [
58] extraction techniques were jointly employed to detect key features within the orthomosaic and hyperspectral swaths. The SURF implementation is that implemented in MATLAB and thus similar to that described in Section 3.4 of [
8], and readers are referred to that paper for a description of the mathematics involved and the rationale behind the technique’s use. Although SURF is a scale-invariant feature detector, it is known to offer a much better performance when the features are of comparable scale and the luminance values similar [
59].
As SURF detects features based on textural analysis, it will identify very few features from an image that contains flat homogenous terrain covered by little or no vegetation. In such a case, MSER feature extraction function is added as MSER incrementally steps through the luminance range of the images to detect stable intensity regions. MSER therefore complements the SURF approach by performing well over homogenous areas of geomorphology [
60]. This increases the number of true feature-matches likely to be identified between scenes in the event they were captured under slightly different lighting conditions.
The MSER implementation is that provided by MATLAB and the key parameters used by MSER and SURF include: the “strongest feature threshold”, “number of octaves” and “feature size”. The values of these parameters used in this study were 100, 1 and 64, respectively. Their values were derived empirically, and readers are referred to [
8,
58,
59,
60] for more details.
Prior to any feature detection, however, a 0.5-pixel 2D Gaussian smoothing kernel is applied to the input images as this slightly improves the inlier feature matching performance of MLESAC.
Selection of Matching Points by MLESAC: An MLESAC algorithm [
61] was employed to ensure outlier-corrupted feature matches do not contaminate the derivation of the best transformation model. As per the previous section, readers are once again referred to [
8] (Section 3.5) for a description of the mathematics and rationale behind MLESAC and its place in the workflow. The values of key parameters (see [
8,
61] for more details) such as “search radius” (search radius associated with identified centre points), “matching threshold” (threshold representing a %-age of the distance from a perfect match) and “ambiguous match ratio” (used to reject ambiguous matches) were 5 m, 100 and 0.6, respectively. Once again, the parameter values were obtained empirically.
Geographical Transformation and Mosaicking: A 2D geometric projective transformation [
62] is used to convert each swath,
, to its correct location and orientation,
. The process requires at least four matching feature pairs so that the resultant transformation model,
, can account for translation, scale, shear and rotation. However, unlike the affine transformation, where parallel lines are constrained to remain parallel, for a projective transformation, parallel lines are permitted to converge to a vanishing point. Generally, the more feature matches correctly identified, the more accurate the transformation.
By applying to each spectral band, RGB images, hyperspectral indices and entire data cubes may be created through the progressive placement of georectified swaths into the (initially blank) mosaic composite matrix, with one mosaic per band stacked as a raster cube. Pixels of the cube or RGB mosaic that overlap with one another may be determined by using the value from the last swath added to the mosaic or by applying a mask to the intersection of the current swath and existing mosaic and averaging the relevant pixels before combining them to the mosaic. The latter is preferred.
Creation of 3D Dense Point Cloud: Following the rectification and alignment of the hyperspectral images to the RGB orthomosaic—which was derived from the DPC (in MetaShape) and is a matrix of dimension —a grid-averaged raster is formed. The process computes an axis-aligned bounding box for the entire point cloud, dividing it into × grid boxes of size GRD. Points within each grid box are merged by averaging their locations, colours and surface normals. As the orthomosaic and rastered DPC are already aligned, it is then a simple process to align the hyperspectral mosaic to the grid averaged DPC by allocating height and surface normal properties to each pixel. This provides a dense hyperspectral point cloud.
5. Experimental Results
The accuracy of the results was evaluated using a combination of qualitative and quantitative factors. The qualitative assessment of the georectified hyperspectral mosaics was carried out by visually inspecting the imagery for gaps, mismatches, deformations, duplicates and discontinuities (
Figure 6). Initially, no manual adjustment was applied, i.e.,
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11,
Figure 12,
Figure 13,
Figure 14,
Figure 15 and
Figure 16 show results for the automated workflow only.
Figure 7 (left) shows a mismatch across an image boundary.
Figure 7 (right) shows duplicate features in neighbouring across-track images.
Figure 8 (left and right) shows gaps and discontinuities in contiguous image sequences.
Figure 9 (left and right) shows discontinuities in the lines of young trees. Comparing these to the georectified mosaic derived from the workflow described in this study (
Figure 10), and particularly the zoomed regions (
Figure 11,
Figure 12 and
Figure 13), shows that linear features such as roads, gullies and tree lines now have continuity across and within swaths, and the visual consistency, sizes and shapes of objects such as trees and branches are also well-maintained (note: the text in these figures is best viewed in a ‘zoomed’ version of the manuscript online). As mentioned in the section on hyperspectral data acquisition, however, the low swath overlap combined with sensor misalignment errors, still results in some gaps between contiguous swaths (note: the faint lines between individual images in
Figure 12 and
Figure 13 indicate the edge of each image. They can be removed using interpolation).
The qualitative assessment of 3D DPC was carried out by visually inspecting point clouds colour coded using the normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI). Accurate georectification and mosaicking of the hyperspectral images will result in high levels of NDVI/EVI (vegetation) coinciding with higher z-axis (height) values of the DPC. In
Figure 14 (and
Figure 15), it can be seen that this is the case: the tall trees at the rear of the site and the shrubs and flora scattered around the site more generally all have high values of NDVI (depicted by darker red colours in
Figure 14 and
Figure 15), whereas the lower levels of NDVI (the orange-coloured elements) correspond to the surface of the terrain.
Figure 16 shows the NDVI 3D DPC for the site at Goat Farm B, which contains repetitive scene structures of young trees. The trees, which form a regular repetitive pattern over the ground (and have high values of NDVI), closely correspond to the small (elevated) bumps in the DPC, which also correspond to the trees.
Adjusting the auto-rectified hyperspectral mosaic using the manually identified reference points (as described above) made no discernible improvement to the qualitative assessment of either the 2D mosaic or 3D DPC, except that there was a slightly better alignment between the higher NDVI values and elevated structures in the DPC. In other words, there were fewer poorly aligned tree structures slightly offset from their higher NDVI values (as can be seen for the auto-rectification approach in the upper righthand corner of
Figure 16).
Quantitative assessment of the relative positional accuracy of the auto-georectified hyperspectral mosaics is the most important factor for evaluating the workflow. The Euclidean distance between the east and north coordinates of randomly selected pairs of reference points, identified across both the georectified RGB hyperspectral and original orthomosaic images, were compared. Basic statistical metrics (mean absolute error (MAE), RMSE and circular error probable (CEP) at the 95% confidence level) were then used to characterise the relevant error distributions. Using 50 randomly distributed reference points, the accuracy achieved for each of the sites is shown below in
Table 1.
It is worth noting that any tool that can be used to assess discrepancies between images, mosaics or DPCs can also be used to improve alignment between them. Consequently, prior to the creation of the 3D DPC and assessment of any automated georectification method, the software used to instantiate the workflow described in this study offers users the opportunity to manually identify matching reference points between the hyperspectral and RGB orthomosaic images. As the tool (provided only in MATLAB) offers predictive point projections, and the images are already accurately mosaicked, the activity only takes about two minutes per 10 pairs of reference points.
These manually identified control points may then be used to reduce any residual discrepancies between the two mosaics. Piecewise-linear transforms are used, such that the hyperspectral mosaic is broken into local piecewise-linear regions and a different affine transformation applied to each region. The accuracy achieved after this manual process is shown below in
Table 2.
Computational Efficiency: A key criterion of any automated technique is its computational efficiency, which is usually expressed in terms of how long it takes to process data on a particular machine. The workflow should be faster, more convenient and more robust than the alternatives. Unfortunately, very few methods report their computational efficiency, with Angel et al. [
8] as an exception. We therefore compare our approach to theirs.
The workflow was entirely coded in MATLAB and executed on a Dell laptop with 32 GB RAM and an Intel
® Core i7-8850H CPU (6 cores) running at 2.59 GHz. Angel et al. [
8] ran parallelised MATLAB code on a desktop with an Intel Xeon-2860 v2 processor that had 200 GB RAM and 20 cores running at 2.8 GHz: a considerably more powerful machine. Unlike [
8], code parallelisation was not used in this study and the laptop had only a modest GPU (Intel
® UHD Graphics 360).
The amount of user and computational effort required to manipulate data is highly dependent upon the size of the site and GRD. In particular, computational effort is closely related to site area/GRD. Both studies created mosaics of roughly the same size, around 10
4 × 10
4 pixels. For mosaics of this size, manual co-registration requires about 3 min of user effort per pair of identified reference points. Thus, for a typical site used in this study (90 individual hyperspectral images), if eight pairs of reference points are identified per image, the manual georectification task will take around 36 hours’ continuous effort, i.e., almost a standard working week of full-time effort (note: although we did not perform formalised time testing, we did manually align the Goat Farm data sets and this amount of projected effort derives from our experience). Alternatively, if effort is expended upfront generating individual swaths during the pre-processing stage, and 27 matching pairs of points are needed to align each of these to the orthomosaic (as per [
8]), around 20 h of user effort is required if the 90 images form 15 swaths.
Using the workflow devised by [
8], which starts with the creation of individual hyperspectral swaths, this is reduced to 3.7–6.5 h. However, it is unclear how much additional effort they required to create the individual hyperspectral swaths during the pre-processing stage. If the SpectrononPro3 software had been used, considerable additional user effort would have been needed: the hyperspectral data sets must be broken into contiguous sequences of raw data commensurate with each swath (groups of consecutive files); and yet more effort is needed if pitch, roll, yaw and timing misalignment offsets need to be optimised before any swath pre-processing takes place.
To increase the degree of automation, we created a “batch processing” approach that requires very little user effort. The pre-processing stage can simply deliver its output to a single folder and the workflow then processes each file/cube individually, accurately georectifying and aligning the output into the mosaic and DPC. Overall, the task of importing all the data, georectifying and aligning it into a hyperspectral mosaic and then displaying them as output is reduced to around 15 min, with the generation of hyperspectral 3D DPC only taking a further 3–6 min (
Table 1). This represents a significant improvement in computational efficiency over previous automated and semi-automated techniques.
6. Discussion
There are several different approaches to georectifying, aligning and mosaicking UAV-based hyperspectral pushbroom data. Those making use of co-registration techniques with RGB-based orthomosaics tend to perform better in terms of accuracy and output than those reliant upon dense networks of GCP [
21,
23]. The most accurate automated approach we are aware of is that of Angel et al. [
8], who achieved MAE and RMSE values of around 7 and 10 times their GRD, respectively (but as good as 1–1.5 times). We achieved average ratios for MAE and RMSE of around 13 and 8, respectively.
As the RGB orthomosaic and hyperspectral mosaics are co-registered to the same pixel framework, a visual assessment of georectification and alignment results is readily achieved by repeatedly switching between images superimposed over one another. Undertaking qualitative examination of the results in this way indicates that the error distributions are generally evenly distributed (as per [
8]) across the sites. However, this visual examination also indicates that there are regions of very high accuracy contrasted with regions that have a lower accuracy. This regional accuracy/inaccuracy is also borne out by visual examination of the structure of the DPC and hyperspectral mosaics.
The prime significance of the above is that, whilst global accuracies for the technique suggest that mosaics and DPCs may not align sufficiently well to assist in modelling complex surface reflectance characteristics under real-world conditions, regions of the DPC and hyperspectral cubes clearly are accurately aligned. Furthermore, these regions are readily identified by visually toggling between the reconstructed and target mosaics.
The regional accuracy does not appear to be related to the number of matched features, but more the constraints that result from a projective transform applied to an entire swath. This suggests that accuracy gains may be available by reducing the number of images processed together (as swaths) prior to application of the feature detection and matching stages. It also suggests that targeted manual co-registration using the MATLAB tool CPSELECT (referred to above) could offer regional improvements. This is the focus of ongoing research.
In theory, SURF and MSER are agnostic to textural variations caused by luminance differences between images. In practice, however, similarity in luminance values across data sets provides an improved performance during the feature matching stage. Although [
8], like this study (but unlike earlier ones), establishes a comparison between hyperspectral and RGB imagery data (earlier studies often used only a single band), white balance or luminance values are not normalised or rescaled. This additional manipulation improves the efficiency and performance of SURF/MSER and MLESAC during the feature detection, co-registration and matching stages. Typically, between 1000 and 4000 matches are generated per swath, i.e., around 15,000–50,000 matches per mosaic. Such an approach is vital to the creation of an accurate projective transform model.
Additionally, it also accounts for (some) variation in the lighting that can occur between the hyperspectral and RGB sensor flights, as well as radiometric differences. It thus offers additional operational flexibility. That is, whilst it is ideal for flight speed and altitude, frame rate sampling and FOV to enable data to be observed at a roughly equal spatial resolution to the RGB sensor, using the workflow described here reduces the need for simultaneous (or near-simultaneous) data collection, i.e., the need for data capture under roughly similar atmospheric, and thus illumination, conditions is relaxed. Similarly, although the GRD of the RGB and hyperspectral sensors almost always differ a little in practice, because resampling is an intrinsic component of our workflow (to align the mosaic pixel indices), features appear in each image set at roughly comparable scales. These factors combine to simplify flight planning.
Other researchers have shown that using only a few GCPs, high precision base stations, and good GNSS-IMU sensors integrated with the cameras are needed to produce high-quality results. For instance, [
63] showed that 3 GCPs/Ha were sufficient to deliver sub-centimetre accuracies, and [
22] likewise achieved similar accuracies with only a few GCPs.
Although at least six AeroPropeller auto-GCPs (
www.aeropropeller.com) were deployed at each site to accurately georectify the orthomosaic and DPC, as the absolute position of the MetaShape products within the WGS84 frame of reference was unimportant to this study, the GCPs were not used in the workflow. In other words, whilst the GCP could have been manually or auto-identified by MetaShape, and the RGB orthomosaic then more accurately composed relative to an absolute framework, this was not undertaken because any georegistration improvement in the orthomosaic would not have contributed to the outcome of this study. Moreover, and perhaps more importantly, visual analysis of the SURF and MSER feature detection and matching showed that the use of the GCPs did not influence the performance of the method in terms of its automation or accuracy (there were typically in excess of 1000 good feature matches per swath between the two image sets). Also, the use of GCPs that auto-register within a first-order survey network impacts only on the time taken to generate the RGB orthomosaic using MetaShape and not our workflow.
Another facet of the data sets used in this study is that, whilst the Goat Farm data comprises young trees, which like the tomato crops and date plantations used in other studies are planted in regular lattice-like structures, the features of the PEGS data sets are more irregular and ill-structured. They are therefore more susceptible to textural variability between the RGB and hyperspectral imagery. Such effects result from the variability of (for instance) shadows cast by trees. In the PEGS data sets, which comprise naturally occurring flora as opposed to strategically planted young trees, the shadows are typically cast in a less well-formed manner, both in terms of contrast and shape. This typically impacts the feature detection and matching stages negatively, and this appears to be borne out by the accuracy results (
Table 1).
In terms of computational efficiency, the focused application of SURF and MSER to the (masked) elements of the RGB orthomosaic that are most likely to contain matching features significantly reduces the image processing load relative to earlier approaches [
6,
8]. Furthermore, based on descriptions by researchers who relied on PARGE [
43,
64] to deliver sub-metric accuracies (4 h for georectified airborne scans comprising 200 spectral lines for 10
6 pixel mosaics), it is thought that the workflow described here requires considerably less computational and user effort. In particular, significantly less user effort is thought to be required during pre-processing as our workflow intentionally employs a batch processing approach, with georectification and mosaicking taking only 18–24 min, depending on whether or not a 3D DPC is also required.
Angel et al. [
8] made use of MetaShape’s
ultra-high levels of DPC resolution. In our experience, whilst the process is largely automated, these very-high-quality DPCs take a very long time to synthesise (often in excess of 24 h), even on high-performance computers. The study reported here used only
high resolution DPCs, which typically take only around 2 h to generate for sites comprising 100 images.
Similarly, executing SpectrononPro3 in its batch processing mode takes around 40 min to process 100 hyperspectral data cubes and place the output into a folder, ready for input to the workflow reported in this manuscript. Considerable additional user effort would be needed to break contiguous data sequences into file sets commensurate with each swath, i.e., in preparation for the approach used by [
8]. For instance, for a data set of 100 pushbroom hyperspectral cubes forming 15 swaths and with data extraneous to the region of interest, it is estimated that such a task would take of the order of 2–3 h. This suggests further potential time savings are available relative to the approach adopted by [
8].
Finally, even though field data indicates that we describe an automated technique for georectifying, aligning and mosaicking hyperspectral pushbroom data that is simultaneously computationally efficient and accurate, there are a few improvements that could enhance its performance. The first is code parallelisation. Despite the workflow not employing parallelised code, this could easily be attempted. However, it should be noted that the mosaicked imagery is progressively built from composite images using matrix or pixel index operations. Parallelisation would thus likely require a computer with considerably more RAM.
Further work could also examine the use of GCP networks, and in particular the auto-detection of the individual boards, over several study sites to examine the absolute position accuracy of hyperspectral mosaics. There may also be merit in comparing a variety of feature identification techniques (SIFT, SURF, BRISK, ORB, KAZE, etc.) to see which deliver optimal performance. However, the results would likely be influenced by the test data set, resulting in the need for it to be significantly expanded. Furthermore, there may also be merit in arriving at parameter sets for the SURF, MSER and MLESAC algorithms used in this study by more methodical means than the empirical methods used herein. A comprehensive cross-comparison of parameters (in particular against varying data sets) would potentially deliver performance enhancements. Once again, however, such an undertaking would be significant.