High Spatial and Temporal Resolution Energy Flux Mapping of Different Land Covers using an Off-the-Shelf Unmanned Aerial System

: With the development of low-cost, lightweight, integrated thermal infrared-multispectral cameras, unmanned aerial systems (UAS) have recently become a flexible complement to eddy covariance (EC) station methods for mapping surface energy fluxes of vegetated areas. These sensors facilitate the measurement of several site characteristics in one flight (e.g., radiometric temperature, vegetation indices, vegetation structure), which can be used alongside in-situ meteorology data to provide spatially-distributed estimates of energy fluxes at very high resolution. Here we test one such system (MicaSense Altum) integrated into an off-the-shelf long-range vertical take-off and landing (VTOL) unmanned aerial vehicle, and apply and evaluate our method by comparing flux estimates with EC-derived data, with specific and novel focus on heterogeneous vegetation com-munities at three different sites in Germany. Firstly, we present an empirical method for calibrating airborne radiometric temperature in standard units (K) using the Altum multispectral and thermal infrared instrument. Then we provide detailed methods using the two-source energy balance model (TSEB) for mapping net radiation (Rn), sensible (H), latent (LE) and ground (G) heat fluxes at <0.82 m resolution, with root mean square errors (RMSE) less than 45, 37, 39, 52 W m − 2 respectively. Converting to radiometric temperature using our empirical method resulted in a 19% reduction in RMSE across all fluxes compared to the standard conversion equation provided by the manufac-turer. Our results show the potential of this UAS for mapping energy fluxes at high resolution over large areas in different conditions, but also highlight the need for further surveys of different vegetation types and land uses. This is the first to demonstrate the capabilities of a long-range, UAS (Quantum Systems F90+ and Altum) to map Rn, H, LE and G energy fluxes over different land cover types and report its accuracy and limitations. We provide


Introduction
Measuring the surface energy balance is an important task for improving our understanding of land-atmosphere interactions at a time when global climate is shifting towards net warming [1][2][3][4]. The magnitude of warming/cooling through radiative and non-radiative processes is related to land surface properties; for example, dense, tall canopies tend to have relatively low-albedo which causes high energy influx to the system (net radiation, Rn) [5,6]. Decreasing albedo in isolation can have a warming effect, however it can be negated by structural properties such as leaf area index which promote transpiration/latent heat flux (LE) over sensible (H) and ground (G) heat effluxes [1,7,8]. This interplay between Rn, LE, H and G, as moderated by the type, physiology, structure and distribution of vegetation that are present, can influence climate at all scales; from the local scale where shade trees can reduce local temperatures (thus mitigating climatic extremes in agroforestry settings [9]), to regional scale weather systems where the amount and distribution of tree cover can influence precipitation patterns [6,10]. Therefore, the accurate measurement of these fluxes over different land use and land cover types is of great interest to food producers, climatologists and land/water resource managers alike.
Measuring energy fluxes at landscape scale is a non-trivial task. Perhaps the most established technique is via Eddy Covariance (EC). EC stations typically compute Rn using irradiation measurements from pyranometers/net radiometers, H and LE from measuring the covariance between heat and moisture variance and turbulent mixing [11], and G using soil heat flux plates [12]. The method requires integrating high frequency (>10 Hz) gas flux and vertical eddy measurements over 30 min time steps to capture the full range of eddies passing the sensors at different rates (i.e., small eddies move over seconds, large-scale turbulence over minutes). EC stations are usually located in areas of homogeneous vegetation classes within the upwind area ("fetch" or "footprint") of the tower. Therefore widespread networks of towers are required to capture the variety of fluxes across different vegetation and land cover types which represent the landscape, such as ICOS and TERENO [12,13].
Resistance energy balance models are sound methods used in remote sensing. They rely on the collection of radiometric surface temperature (Trad) using thermal infrared (TIR) sensors, to effectively cover the survey area of interest. Plant structural and meteorological parameters and Trad are used as inputs to produce spatially distributed flux estimates. Originally applied to satellite remote sensing data, a number of energy balance models have been successfully adapted for use with very high resolution imagery, either from manned piloted airborne platforms or Unmanned Aerial Systems (UAS). UAS refers to remotely piloted/unmanned aerial vehicles (UAVs) with a sensor mounted on board. With the recent technological advances in miniaturized cameras and navigation systems, UAS are increasingly being used as a more cost-effective and flexible tool in mapping energy and water fluxes, using models such as "mapping evapotranspiration with internalized calibration", METRIC [14,15], and Two Source Energy Balance model, TSEB [16][17][18][19][20]. To date TSEB is one of the most widely-used spatially-distributed model [21], and has delivered accurate flux estimates using both satellite-and UAS-derived data, independent of resolution [19], in a variety of settings [18,[22][23][24][25].
Remote sensing can provide additional information to EC methods. Perhaps most importantly, by using high resolution gridded data, remote sensing can be used to estimate evapotranspiration over individual plants (or plant clusters) [18], and can capture fluxes from different land cover types in one survey [23,26], although many studies have focused on a single vegetation type for testing [21]. Furthermore, the resistance energy methods produce spatially distributed estimates of soil heat fluxes, whereas the EC station ground heat flux plates are fixed to one location, typically near the base of the station [12]. While G can be extrapolated if the vegetation is homogenous, G may vary spatially during in-situ land cover change events (e.g., harvest, mowing or leaf senescence). While the UAS method is not restricted in range by the necessity for homogeneous land cover, the range of the survey area is limited by battery life, local airspace restrictions and air laws, and operations are limited to a narrow window of meteorological conditions (e.g winds below 9 m s −1 ). Some platforms have, however, surveyed very large areas in a single flight [27,28]. Typically the reported accuracy of TIR cameras is low, meaning additional control and evaluation steps are often taken to reduce errors caused by internal calibration processes, variability in surface emissivity and atmospheric conditions [18,29,30]. Even though spatially distributed resistance models can only estimate instantaneous fluxes at the pixel scale (compared to EC stations which take measurements 24 h per day, collecting raw data up to 20 times per second) it is still possible to extrapolate hourly/daily flux estimates using remote sensing data [26,31,32]. It should also be noted that some of the constraints of the EC method also apply to the UAS method; both are restricted to flat terrain [13], are limited by sub-optimal meteorological conditions, i.e., atmospheric stability can hinder retrieval of turbulent fluxes [33,34], and both require expert knowledge and experience to collect, process and analyse the data. Additionally, the TSEB model used to estimate the UAS-derived energy fluxes, requires several underlying assumptions in addition to the prerequisites of the EC method, e.g., flux-gradient similarity and accurate knowledge of all non-turbulent surface energy-balance components, including storage terms, which are difficult to measure and therefore often assumed negligible [35]. While UASs are often cheaper than EC stations, they cannot replace EC stations because they provide the most direct method of measuring all major fluxes, whereas UAS can only estimate Rn, H and G, then assign the residual imbalance to LE [7]. Therefore EC methods provide an ideal control for assessing the quality of UAS-derived fluxes [19,36].
Typically, thermal infrared cameras have a lower pixel resolution than standard RGB and multispectral cameras, and as such the tie point-matching algorithms in photogrammetry software (such as Agisoft Metashape and Pix4d) can only function well if there is very high forward and sideward image overlap. This invariably limits the range of the UAV, and consequently the size of the survey area. Some thermal camera manufacturers have integrated an RGB or multispectral camera into the unit so that they have overlapping field of views with the thermal camera [30]. Photogrammetry software can then use the higher resolution optical bands to create accurate thermal orthomosaics. These systems are advantageous because in one single flight a multitude of datasets can be collected (e.g., RGB orthomosaics, vegetation and soil indices, 3D point clouds, as well as radiometric temperature). An example of this system is the Micasense Altum, which has been used for a variety of purposes, including drainage mapping [37], high throughput plant phenotyping [38], bark beetle infestation mapping [39], and vineyard crop health [40]. Many of the existing studies have used thermal infrared cameras mounted on multicopter UAVs, while these platforms can be very versatile, their range can be very limited even without accounting for high image overlap (typically < 20 hectares). The range of fixed wing UAVs can be much greater, however there are operational limitations such as the requirement for large open areas for landing and take-off. Some UASs developed in research institutions (such as the very successful AggieAir system, by [28]) have comprehensive payload sensors (e.g., short, medium, longwave sensors) impressive flight times, and collect scientific standard datasets. Building this kind of system requires expert skills and experience, time, resources and of course money, something not available to many researchers, land and water managers. Several off-the-shelf (OTS) solutions are now on the market aiming to address some of these limitations; some manufacturers have developed vertical take-off and landing (VTOL) fixed wing aircraft which have the take-off and landing flexibility of multicopters, as well as the range advantages of fixed-wing UAVs, with some including integrated RGB/multispectral and thermal infrared cameras. To date few studies have tested their ability to produce reliable and accurate energy flux estimates [21,30], and never with a long-range VTOL UAS.
In summary, there is a wealth of continuous flux data from EC stations across the globe at locations with different land cover types; however resolving the spatial variation in fluxes in areas with mixed vegetation (i.e., heterogeneous plant communities, plant canopy structures etc.) is more challenging. Here, UAS data has the potential to detect and estimate energy and water flux variability at very high resolution, so that we may assess how different land use practices can influence fluxes, the climate and hydrological cycle. An important step in realizing this capability is to calibrate and validate UAS-based fluxes using high-accuracy, directly measured fluxes from EC station in a variety of settings.
This study is the first to demonstrate the capabilities of a long-range, OTS UAS (Quantum Systems Trinity F90+ and Micasense Altum) to map Rn, H, LE and G energy fluxes over different land cover types and report its accuracy and limitations. We provide a detailed methodology for producing energy balance model input datasets that is reproducible across multiple sites with different environmental conditions and plant communities, and present error statistics based on control data from EC stations. Furthermore, we assess two methods of mapping energy fluxes (Two Source Energy Balance Priestley-Taylor, and Dual Time Difference) and report their correlation against three methods of closing the surface energy balance in EC station data.

Methodological Overview
This paper demonstrates a methodology for producing high-resolution spatially distributed energy fluxes using an OTS unmanned aerial system. Fluxes derived from an EC station were used to assess the accuracy of the UAS-derived fluxes by flying the UAS over the "footprint" of the EC station, then comparing the spatially distributed fluxes within the EC footprint with the reported EC estimates for the coincident time period. We use the Two-Source Energy Balance (TSEB) model in this paper and briefly discuss how it calculates the fluxes, however we have not modified the calculations that TSEB performs, therefore a full description of the mechanics behind TSEB can be found in [18] or [7] or [20]. This section will sequentially detail the data collection, processing and evaluation methods as summarized in Figure 1.

Eddy Covariance Station Sites
Four different eddy covariance stations across three sites were used to test flux mapping using the UAS. Two EC stations are on grassland sites (Graswang, DE-Gwg [41], and Fendt, DE-Fen [42], which are both operated and maintained by the Karlsruher Institut für Technologie (KIT) network. Full details on the instrumentation can be found in [12]. Mooseurach is a peat bog that was used for forestry until 2016. After several windfall events and increasing bark beetle infestation, the area was cleared. Since then there has been no further management on the area. In addition to the existing 30 m tower (Mooseurach Tall), a 6 m tower (Mooseurach Small) was installed in the former main footprint. This site is part of the ICOS Network [DE-Msr] [43,44], site details and information can be found in Table 1.

EC Station data and Footprint Delineation
Flux data are calculated over 30 min intervals so that the full range of eddy sizes can be measured. Flux data at Graswang and Fendt were calculated using TK3.1 [45][46][47], whereas data from Mooseurach were calculated using the Eddy-Pro 7.04 by LI-COR Biosciences, (LI-COR Inc., Lincoln, Nebraska, United States). The EC stations at all sites provided supplementary meteorological data, as well as irradiance data as input parameters for the TSEB model.
Footprint delineation provides a spatial component to EC station-based flux estimates so that UAS and EC station-based methods are comparable. For Graswang and Fendt, TK3.1 enables the user to produce a footprint estimation as a spatial probability distribution (in ASCII format) for each 30 min time step using the model by [34]. The performance of the model from [34] that has been used for this study was evaluated by [48] for the Graswang site by a series of controlled tracer release experiments. The results of this study show that it produces realistic footprint estimates. At least for this measurement height and this surface type, the Kormann and Meixner model also agrees well with newer models, such as the FFP model of [49].
Each footprint raster was produced at 1 m resolution, andwas subset using the 98 th percentile because this value consistently produced footprints which largely covered the areas in which the EC stations were situated, and was therefore estimated to provide a representative area in which the mean instantaneous fluxes were estimated from the UAS data ( Figure 2).
For Mooseurach, footprint extent estimates are provided in the Eddy-Pro outputs, which include footprint peaks, wind direction, 10, 30, 70, 90% distances. As the weather on the day of the UAS survey was relatively stable with very low wind speeds the footprint statistics were not available for many time steps, so the Mooseurach Small footprint was averaged as a 100 m buffer around the station, and the Mooseurach Tall footprint was delineated using the wind direction limits for the survey period, and maximum distance which overlapped the UAS survey area.

EC Station Energy Balance Closure
Net radiation (R) is the product of all incoming and outgoing short-and long-wave radiation, and during daytime defines the total energy flux entering the land surface system (minus reflection and emission back into the atmosphere). This energy has three main fates; sensible heat flux (H), the transfer of energy from surface to air (advection) and is transported by convection, latent heat flux (LE), the energy used for evaporation and transpiration of water (or evapotranspiration, ET), and ground heat flux (G), the energy exchange through conduction into the ground heat storage [6,50]. The surface energy balance is summarised by the following equation: where is net radiation, is the sensible heat flux, LE is the latent heat flux, is the ground heat flux and refers to an imbalance, which is a residual term if the energy balance is not closed (all in W m −2 ). EC stations measure each individual flux using instrumentation, which often leads to a lack of closure in surface energy balance (Imb) [51]. This can be caused by instrumental errors, errors in the processing chain (pertaining to averaging and flux correction methods), and additional terms that are not measured (canopy heat storage, biochemical storage, horizontal advection, and water pumping by plants) as covered in detail by [52]. To ensure comparability between EC and UAS-derived fluxes and to close the energy balance, we applied four different treatments to the EC station H and LE fluxes, [19,52]

Unmanned Aerial Vehicle
We used the Trinity F90+ (Quantum-Systems GmbH, Gilching, Germany), an off-theshelf (OTS) vertical take-off and landing (VTOL) UAV, with a maximum payload of 500 g. Its VTOL capabilities mean the minimum-required take-off and landing area is approximately 10 m × 10 m; from this small area the UAV launches/lands vertically in "coptermode", then transitions to horizontal flight once clear of nearby obstacles. With its optimal 17 ms −1 airspeed, the Trinity F90+ has a maximum flight time of 90 min, meaning it can cover an area of around 150 ha per flight at 100 m above ground level (with 76% sideward image overlap). Flight missions are designed using QBase v2.2.21 software, which links the transmitter, UAS and ground station together via 2.4 GHz wireless connection. QBase has a global digital terrain model and high resolution satellite imagery which allows for precise and safe horizontal and vertical flight path planning. The user can also set up flight spacing and payload sensor triggering to achieve a desired forward and sideward image overlap. The OTS UAS includes a GNSS base station (iBASE, Quantum Systems GmbH, Gilching, Germany) for post processing kinematics (PPK), which can be used to differentially correct image geotags, improving positional accuracy and Structure From Motion (SfM) processing times.

Sensors
We used two airborne sensor payloads for this study (Table 2); the Micasense Altum (Seattle, WA, USA) to produce multispectral (MS) and radiometric temperature orthomosaics and a canopy height model (CHM) for Graswang, and a Sony RX1RII RGB camera to produce a CHM for Mooseurach using structure from motion techniques. The Altum has five high-resolution multispectral bands (blue, green, red, red edge and near infrared) and an integrated long-wave thermal infrared (TIR) sensor (based on the FLIR Lepton) which is aligned with the MS sensors. The Altum has a separate Downward Light Sensor (DLS2, Micasense, Seattle, Washington, USA) to provide multispectral calibration information in changing light conditions. The TIR sensor recalibrates every 5 min or when a 2 K change in temperature occurs. The reported accuracy is +/-5 K with a thermal sensitivity of <50 mK.

. Surveys and Flight Parameters
In all, 30 flight surveys (with multiple surveys per flight) were conducted between 20/07/2020 and 20/10/2020, and a summary of the flight details can be found in Table 3. Local air navigation regulations limit the flight altitude to 100 m above ground level (AGL), and within visual line of sight. The UAV flies fully autonomously with terrain following capabilities, and maintains its airspeed and updates its heading in real-time to compensate for changes in wind direction and strength. The Altum and Sony have a maximum trigger rate of 1 Hz, and as such the maximum forward overlap that can be achieved is 76 and 78 % respectively (maintaining 100 m altitude, and a forward airspeed of 17 ms −1 ). Side overlap was set to 75 % in all cases, to ensure good quality tie point matching in orthomosaic production. Changes in groundspeed can lead to changes in the forward image overlap (i.e., maintaining airspeed, flying downwind/upwind would increase/decrease the groundspeed, and reduce/increase forward overlap). Therefore, when winds aloft were above 2 ms −1 (in the forecast) the UAS flight lines were set perpendicular to the prevailing wind direction. A period of 30 min for acclimatization (leaving the Altum powered on but not capturing images) is recommended before flying. For calibration of multispectral bands, images of the Altum calibration panel were taken before and after each flight from a height of around 1 m, with the panel clear of shadows.

. Orthomosaic and Point Cloud Production
Altum images were pre-processed in QBase, where individual image geotagging data were differentially corrected using the iBase PPK procedure, and saved to a text file. Preprocessing of Altum imagery (e.g., vignetting, dark pixels and radiometric correction) is applied to each image in Agisoft Metashape (St Petersburg, Russia). We used Metashape to create orthomosaics, individual orthophotos and point clouds. Multispectral bands were calibrated within Metashape using the pre-and post-flight Altum calibration panel images and DLS2 data if the conditions were sunny or overcast. Two orthomosaics were produced for each overpass of the survey area, one each at the native resolutions of the multispectral and thermal infrared bands. Point clouds were generated from RGB images taken using the Sony RX1RII.

Two Source Energy Balance Model
Here we provide a brief overview of TSEB as the full details (including equations) have been previously published in [7,18]. The Two Source Energy Balance model (TSEB) was developed by [7] and builds on the Shuttleworth-Wallace dual source energy model [53], by deriving both soil and canopy components of LE and H, i.e., having the ability of partitioning soil evaporation from plant canopy transpiration, rather than just the bulk surface fluxes. TSEB equations are solved at the pixel level using a set of gridded and single value inputs.
Gridded radiometric surface temperature (measured from the UAS), zenith viewing angle and leaf area index are used to calculate component canopy and soil temperature Tc and Ts (Equation (2)): equates to the vegetation fraction cover at the sensor viewing angle and can be calculated from the leaf area index (LAI) and clumping factor (Ω), (Equation (3)): The soil and canopy energy budgets are calculated from Tc and Ts separately (Equations (4)-(8)): The differences in and and the air temperature, drive canopy/soil sensible heat fluxes ( and ), and are moderated by resistance to heat transfer between the canopy-atmosphere ( ), and soil-atmosphere ( ) [54]. The first TSEB variant used in this paper solves both Tc, Ts, and Hc, Hs using as a first estimate a potential transpiration (LEc) based on the Priestley-Taylor (PT) equation ( ) and the fraction of LAI that is green (fg) (Equation (8)) after which the model iteratively increases Tc and decreases Ts (in Equations (2), (4)- (7)) to find realistic values which would lead to non-negative latent heat values during daytime. Terms ∆ and refer to the slope of the saturation vapour pressure versus temperature and psychrometric constant respectively [55]. Resistances are in turn a function of surface roughness (approximately 0.125 canopy height) [20,56,57]. TSEB-PT then apportions LE as the residual of Rn and H. TSEB-DTD (dual-time difference) is the second TSEB variant used in this paper. DTD works in much the same way as TSEB-PT to estimate Tc and Ts, however H is solved using two gridded Trad layers, and two air temperature values, with one measurement taken up to 1.5 h after sunrise when the values for Hc and Hs are minimal, and the other during the day. This method has previously reduced errors caused by atmospheric transmission and emissivity variations in a variety of settings [18].

Overview of Inputs
The TSEB models were implemented through the pyTSEB packages which are freely available via the GitHub repository at https://github.com/hectornieto/pyTSEB (accessed on 20/11/2020), and a full description of the input parameters can be found in the supporting literature at https://pytseb.readthedocs.io/en/latest/index.html (accessed on 20/11/2019). Input data can be provided as single values or rasters, and a full list of parameters and their sources in this study is given in Table 4. TSEB calculations for trees involved modification of four input parameters because it is assumed that air temperature and wind speed are measured sensitively higher than the target canopy; therefore wind speed and air temperature measurements were driven to 100 m above the ground. Hence the air temperature and wind speed were modified according to the adiabatic lapse rate and wind profile function respectively [58]. All raster inputs must have the same extent and resolution, and were processed at the native resolution of the TIR band (which varied between 0.67-0.81 m depending the survey flight heights), and the processing methods are detailed in the following sections. Given the reported uncertainty of the Altum TIR band was relatively high (± 5 K), we performed a control experiment using a ground-based Apogee SI220 Infrared Radiometer (Apogee Instruments, inc, Logan, Utah, United States) to evaluate the Altum's accuracy and precision when in flight. The Apogee radiometer has a measurement uncertainty of ± 0.2 K and is designed to measure canopy surface Trad. At each site, the Apogee radiometer was mounted on a tripod around 2 m from the ground, and aimed at the centre of an 8 m × 8 m control area of homogeneous grass. The control area is large enough to compensate for the "spot size effect" whereby radiometric temperature errors are increased when the number of pixels covering an object is less than 10 × 10 [59]. At a flight altitude of 100 m AGL, the ground sampling distance (GSD) of the Altum TIR band is around 0.7 m thus negating the spot size effect. Grass is also an ideal target material because of its high estimated emissivity (>0.97) [60], abundance and because it is representative of surface material whose temperature is measured in this study. The Apogee-Altum comparison across different sites, times of the day, air and surface temperatures is a simple method of correcting for errors brought about by atmospheric transmission, differences in target emissivity, different viewing angles, and changing air temperatures (which may affect instrument calibration. Data were logged every 20 s using an Arduino Uno (Arduino, Sommerville, Massachusetts, United States) with an SD card and a real-time clock module.
After "bundle adjustment" in orthomosaic production in Agisoft Metashapes, we exported individual orthophotos (images which have been orientated and georeferenced) that covered the temperature control area. For each orthophoto, the average digital number (DN) pixel value was extracted for the temperature control area, and the timestamp was extracted using the ij_tiff package [61] in R [62]. The orthophoto timestamp was then used to find the closest Apogee radiometer measurement, and the corresponding temper-ature was recorded. We used a linear regression analysis to examine the relationship between DN in the TIR band and Apogee-derived surface temperature. The slope and intercept values from this analysis were used to convert DN to K for all Trad inputs, and the results can be found in Section 3.1.
As a point of comparison, we also use the published Micasense equation (Equation (9)) to produce Trad inputs for the TSEB models, and these results are summarised in Section 3.2, and in Appendix A Figures A3 and A5, Tables A1 and A2. = 0.01 (9) where is observed radiometric temperature, and is the digital number.

Canopy Height Model (CHM)
Canopy height is an important parameter for calculating roughness length within TSEB. To create a CHM, point clouds were made in Agisoft Metashapes from very high resolution (<2 cm GSD) RGB images taken from the Sony RX1RII camera. As all of our survey sites were mostly flat (a prerequisite of EC station siting), it was relatively easy to produce a digital terrain model (DTM): Point clouds were firstly cleaned, filtered and decimated in CloudCompare [63], then ground-classified and normalised using LASground (step size 20 m) [64]. The LidR package [65] in R was then used to produce a gap-filled CHM.
Much of the area was dense or mowed grass, which produced unrealistic canopy height values because the actual canopy height in these areas is near the sensitivity of photogrammetric point clouds. For these areas, we measured the grass height by hand using a ruler and applied a standard grass height value to each land cover type using the Land Cover Map, detailed in the following section.

Land Cover Map (LCM) and Vegetation Masks
From the Altum images we produced orthomosaics with 5 optical and 1 thermal band in the native resolution of the thermal band. For each UAS survey the orthomosaic was stacked with the CHM as a 7 band input for a random forest classifier. Random Forest was chosen because it is robust against outliers, and the input datasets require minimal pre-processing [66]. These data were processed in R [62] using the method and code provided in [67] as a template. At least 5 polygons per land cover class were created as bounding areas for training/testing points, which filled each polygon at density of 1 per pixel. We used 10 folds for cross-validation. The training/testing datasets were split 70/30 (e.g., there were 4086/1748, 4587/1965, 2535/1083 training/testing points for Graswang, Fendt and Mooseurach respectively), and using 500 trees, classification accuracies were all >95%.
The LCMs were used to create building and shadow masks, as well as vegetation masks for the calculation of fluxes individually for each vegetation type.

Leaf Area Index (LAI)
We sampled the LAI of each vegetation type using a LI-COR LAI 2200 plant canopy analyser (LI-COR Inc., Lincoln, Nebraska, United States). We made several LAI sample measurements according to the manufacturer's guidance, and used "above canopy" measurements to correct for changing sky conditions. For mowed grass it was not possible to accurately measure LAI, and thus we predicted LAI based on published LAI-grass height relationship [68].

Green Fraction
The orthomosaics produced at the native resolution of the Altum's multispectral bands (approx. 0.05 m GSD) were used to create a normalized difference vegetation index layer (NDVI). The green fraction layer was calculated as the proportion of the ~0.67-0.81 m pixel covered by 0.05 m NDVI pixels with a value greater than 0.5.

EC Station-UAS flux Comparisons
TSEB model outputs for each individual vegetation class were merged for each survey flight, and the Rn, H, LE, and G values within the corresponding footprint were averaged. These UAS-derived flux values (W m −2 ) were compared to the uncorrected and corrected (for energy balance closure) fluxes estimated by the EC station. The EC stations report flux estimate quality control flags for each 30 min integration period which are based on the outcomes of two meteorological tests, the "steady state" and "turbulence" tests, and are both described in [69]. They reflect the confidence of flux estimates based on meteorology (e.g., atmospheric stability) and whether the assumptions used in the eddy covariance methods are violated. High quality flags indicate data suitable for research, moderate flags indicate data suitable for long-term analyses only, and low quality data should not be used for research purposes [69]. Here, only the highest quality flux data were included in this study.

Conversion Factor for Altum Thermal Infrared Band
Here we produce a study-specific conversion factor based on a linear regression between Altum TIR DN taken during UAS flights, and ground control measurements taken using an Apogee Infrared Radiometer (Figure 3). In total we made 146 control measurements, with surface temperatures ranging between −4.4 and 30.7 °C (Apogee measurements). When Altum DN is converted to Kelvin using Equation (9) (recommended by Micasense) and compared to Apogee measurements, the root mean square error (RMSE) is 3.4 K, with an accuracy (bias) of 3.1 K, and precision (standard deviation) of 1.5 K. Using the equation in Figure 3 to convert from DN to Kelvin, the RMSE = 2.0 K, mean bias = 0.7 K and standard deviation = 1.9 K. We show that using this simple method, it was possible to reduce systematic bias and overall error in our radiometric temperature datasets. The coefficients used in this study may not be applicable to other surveys using the Altum under different conditions. Further surveys at a range of temperatures and atmospheric conditions comparing these two datasets may yield more robust and representative coefficients, however this work only corrects some of the errors caused by the conditions specific to our surveys. While it is not possible to ascertain the exact causes of the errors, this correction serves as a simple atmospheric correction whereby different influencing factors (e.g., atmospheric, flight parameters and viewing angles) between flights have been corrected. Figure 3. Relationship between digital number (DN) derived from the thermal infrared band from airborne Altum images and ground control data from the Apogee SI 220 infrared radiometer. The green dotted line represents the conversion factor published by Micasense (Equation (9)). The grey shaded area indicates standard error.

UAS vs. EC Station Flux Estimates
In total 22 UAS-EC comparisons are made, as eight 30 min EC flux estimates were flagged as low quality due to uncertainties in flux estimates, or footprint estimates, and were not suitable for comparison. We tested two variants of the Two Source Energy Balance model, TSEB-PT (Priestley Taylor) and TSEB-DTD (Dual Time Difference). Firstly we compare uncorrected EC station fluxes with TSEB outputs, and secondly corrected EC fluxes using three different methods to close the energy balance, namely by; 1. adding imbalance residuals (Imb) to H (Res_H) and using uncorrected LE; 2. adding imbalance residuals (Imb) to LE (Res_LE) and using uncorrected H; 3. by maintaining the Bowen Ratio (BR); We principally use the study-specific conversion factor from Section 3.1 to produce radiometric temperature inputs for TSEB (in K), but also show resultant TSEB outputs using the standard conversion factor (Equation (9)), and compare these to the EC control flux estimates.
When assessing the performance of the UAS vs. EC flux estimates, we consider the overall error using RMSE (<50 W m −2 is considered ideal [70]) calculated, the mean bias (which can be corrected), the standard deviation (indicating the random error), and the regression coefficients which signal the predictive strength of the relationship (a slope of 1 indicates a 1:1 scaling relationship, and R 2 reflects the magnitude of the residuals compared to the regression line). The UAS and EC data are treated as "predicted" vs. "observed" respectively, and are plotted on the X and Y axes in regression plots accordingly [71]. We provide further analysis by examining fluxes from individual flights against EC flux time series.
The Trinity-Altum UAS produced very high quality thermal orthomosaics of our survey areas, with little blurring and very few artefacts, even in turbulent or gusty conditions. The TSEB outputs are also high-quality, an example of which can be seen in Figure 4, and in Appendix A Figures A1 and A2. TSEB-PT estimates of RnUAS were in closer agreement with RnEC than the DTD methods, with RMSE < 50 W m −2 across all sites (Table 5, Figure 5). The DTD method aims to reduce biases in H and G caused by errors in remote sensing of Trad (e.g., atmospheric transmission and/or inconsistencies in emissivity), and here DTD did provide more accurate and precise estimates of HUAS than PT. While the GEC-GUAS correlation was closer to unity (slope = 0.60 vs. 0.39) using the DTD method, errors were still quite high (> 50 W m −2 ) owing to the larger bias. The DTD method was more successful in reducing biases in HUAS than GUAS. This was also observed when we used the lower-accuracy Trad inputs (using Equation (9)) for TSEB; both HUAS and GUAS from the DTD model had improved slopes, R 2 , and significant P-values compared to PT model outputs ( Figure A3, errors are reported in Table A1). LEUAS and uncorrected LEEC are not comparable (Figure 5), which reflects the differences in energy balance closure between EC and UAS-based methods where TSEB forces closure by adding imbalance residuals to LEUAS.  Both TSEB models produced estimates of RnUAS in close agreement with RnEC throughout the day, with a tendency to over/underestimate in grassland/woodland regrowth sites (Fendt and Graswang/Mooseurach) ( Figure 6). Diurnal changes in RnEC are closely matched by RnUAS with a maximum error of 111 W m −2 at the Graswang site in September, which can be explained by the presence of cumulus clouds intermittently obscuring the sun after midday. This confirms an operational limitation in the UAS technique, whereby homogenous cloud cover is more likely to yield flux estimates which are consistent with EC station measurements, and is in agreement with [19]. GUAS was consistently higher than GEC and agreement is considered poor because the errors across all sites and times were high (RMSE > 50 W m −2 ). It should be noted that GEC is measured at the EC station site only within a radius of ca. 5 m (i.e., not estimated for the wider area within the EC station footprint) [12], and therefore spatially distributed GUAS are less comparable than other fluxes, especially where there is a difference in land cover across the survey site (e.g., grass vs. mowed grass). We investigated whether extracting the TSEB estimates of G from within a 5 m buffer of the EC station (compared to the EC station footprint) yields better agreement with EC station estimates ( Figure A4), however there was no significant difference between G estimates (using the 5 m buffer vs. EC footprint) for both the DTD (t = 0.88, degrees of freedom = 40.66, p-value = 0.39) and PT (t = 0.97, d.f. = 49.43, p-value = 0.34) TSEB methods. The RMSE was actually higher when using the 5 m buffer to extract G rather than the EC footprint for both DTD (RMSE = 61.3 vs. 52.3) and PT (RMSE = 60.2 vs. 51.8). For both TSEB models, the greatest errors were observed at Graswang in September and July. These results highlight a potential weakness in TSEB, where G is estimated as a function of Rns, indicating further work is required comparing GUAS and GEC in different meteorological and environmental conditions. GEC estimates in this study should be treated with caution as the in-situ soil measurements used to estimate GEC may not be high-quality.
The best correlation between UAS and corrected EC flux estimates was from the Res_LE method of closing the surface energy balance (whereby all imbalance residuals are assigned to LE, and uncorrected H is used) ( Table 6). As TSEB explicitly calculates HUAS, this an encouraging result, particularly for TSEB-DTD, RMSE, standard deviation and mean bias were all low, with a slight tendency for underestimated LEUAS and HUAS compared to EC data. Although the spread of the data were similar (both methods had R 2 ≈0.8), the regression coefficients were closer to unity using the DTD method (Figure 7). This finding is in agreement with [72] who compared EC estimates with lysimeters at an alpine grassland site (similar to Graswang), but in contrast to [20] who found closer UAS-EC agreement using the BR method at the Fendt site used in this study. Using the BR energy balance closure method, both DTD and PT methods estimated H poorly, but RMSE, accuracy and precision were all similar. The Res_H method of energy balance closure yielded the worst agreements between EC and UAS datasets, most likely because TSEB explicitly estimates H using the temperature gradient between Ta and Tc/Ts (as derived from Trad) [7]. When errors in Trad were introduced to the TSEB models (using Equation (9)), the UAS-EC correlations become highly insignificant using the PT method, however some improvements are observed in LE using the DTD method ( Figure A5, Table A2). It should be noted here that the larger errors in radiometric temperature erroneously shift the energy fluxes from a LE-to a H-dominated system because Ts and Tc become hotter than Ta.   Figure 8 shows how UAS and corrected EC flux estimates vary at different sites across the day. For the two grassland sites (Graswang_Sept and Fendt) both LEUAS and LEEC, and HUAS and HEC agree very closely, especially when Res_LE method is used to close the EC energy balance. The agreement at Mooseurach is worse for the Res_LE method (with large under/overestimates of LE/H) than the BR method. The atmospheric conditions were stable on the day of the survey, leading to a number of low-quality flux readings from the EC stations. However, on aggregate using all data (including low quality flux data) the mean LEUAS values for the small and tall tower footprints (233 and 231 W m −2 ) were similar to the mean LEEC values using the BR closure method (219 and 255 W m −2 ). There are a number of possible explanations for poor agreement at Mooseurach, firstly atmospheric stability suppresses turbulent fluxes and can cause errors both in EC and TSEB estimates [33,48]. Secondly, on the day of the survey the average difference between air and surface temperature was 0.4 K; as the temperature gradient between the air and Ts and Tc drive the sensible heat flux, errors are more likely when this differential is low [7]. Thirdly, at this time of the year many of the shrubs and trees were losing their leaves, while we made every effort to measure LAI using the LICOR LAI 2200 plant canopy analyser, we had to manually adjust the LAI to account for biases caused by the lack of green leaves [73]. As TSEB is very sensitive to LAI, this could be a significant source of error [23,74]. Fourthly, the site was water-logged, which may be problematic for TSEB because the model has not been tested in water-logged or flooded conditions, therefore the assumed relationship between Rns and G may not apply here, which may increase errors in G estimates. Lastly, the vegetation structure is very complex at this site which may increase the influence of processes which are not accounted for in either TSEB or EC station estimates/measurements (including canopy heat storage, biochemical canopy storage, and horizontal advection [35]), and further surveys under a variety of plant growth and meteorological conditions would help understand the capabilities of this UAS for flux mapping in heterogeneous vegetated areas. Despite the low sample size in Mooseurach, we include these data in the ensemble to demonstrate how site surveys with completely different vegetation and meteorological characteristics can be compared using the equipment and same methodology. The findings of this paper highlight an important limitation in comparing UAS and EC flux estimates. Both TSEB models force energy balance closure by assigning all residuals (after G and H have been calculated) to LE, whereas the EC station balance is rarely closed, and so requires a closure method to allow comparability with UAS estimates. The literature has shown that the most accurate EC energy balance closure methods vary with different sites/site conditions [35]; EC vs. lysimeter comparisons show better performance in the Res_LE in an alpine grassland [72], whereas in different pre-alpine grasslands the BR method outperforms [52,75]. In previous remote sensing-EC comparisons, both methods have also been adopted (see [76,77] for Res_LE, and [19,22] for BR methods. Here we show that Res_LE produces consistently lower errors across all sites than BR (Table 7), owing to a better regression slope (closer to 1) for both H and LE (Figure 7e and f versus  a and b). This result could be expected given that this EC energy balance closure method mirrors that of TSEB, but perhaps the conclusion to be drawn is that while the EC station method is a robust control method because it directly measures fluxes and reports uncertainty (usually in the form of a quality control flag, and or percentage uncertainty), the EC energy balance closure method itself is subject to uncertainty, and therefore the EC-UAS agreements should be treated cautiously. Both TSEB methods yielded low errors in conjunction with Res_LE (Table 6) and there was no significant difference in residuals for LE and H between DTD and LE (Welch Two Sample t-test; t = −0.009, d.f. =237.84, p-value = 0.99). However, the regression coefficients were more favorable for both LE and H using the DTD method ( Figure 7). These slight benefits in using DTD show that it may not be worth the additional logistical challenges involved in collecting the data for DTD, whereby a UAS survey must be conducted shortly after sunrise [18], however further work (at more sites, under a range of conditions) is necessary to support this.
While the EC stations report measurement uncertainty, the TSEB methodology does not. This study produces error statistics for TSEB flux estimates that could be used as a foundation to quantify uncertainty and error propagation in future UAS-derived estimates, which may be important if the TSEB is used to bridge the scale gap between ECand satellite-based measurements.
A clear path for further work is to collect more data using this UAS at non-grassland sites. Grassland sites are ideal systems for testing the UAS because the vegetation communities are simple, access is easy and flights can be conducted safely with a low risk of striking obstacles. We found that the errors were higher at the more complex regrowth woodland site (Mooseurach) compared to the grassland sites, however given the small sample size and EC quality control flags, it is unclear as to whether these errors are caused by the vegetation complexity, the non-ideal stable meteorological conditions [78], or instrumentation errors. Further surveys at this site at more active times of the year (e.g., high summer) would enable further inter-site comparison, and a more robust test of the UAS. The network of ICOS EC stations is distributed over many different land cover types, and given the long range of the UAS, many more EC-UAS comparisons could be made.
Converting from DN to radiometric temperature using our empirical method resulted in a 19% reduction in RMSE across all fluxes (and 20%, 28%, 17%, and 15% reductions for Rn, H, LE and G respectively), compared to the standard conversion factor. While we made important steps in correcting radiometric bias in the thermal infrared data, further improvements could be made by performing radiometric calibration using a black body reference source [29]. However, the method employed in this paper shows that reasonable environmental and atmospheric corrections can be applied without the use of a blackbody reference source in a controlled environment.
The UAS used in this study has the added benefits of providing spatially distributed estimates at very high resolution. This can enable researchers to discern how differences in vegetation types (e.g., forest vs. shrub vs. grass), phenotypes (of cropped systems), and land use practices (e.g., the use of different fertilisers) can influence the surface energy balance (and water usage). The system could be used as a low-cost complement to the EC station and can help resolve small-scale variability within the modelled footprint, however it is worth noting that the EC station methods are better suited to long term measurements, rather than mapping instantaneous fluxes. Given the flexibility and long range of the UAS, it can also be used to bridge the scale gap between EC station and satellitederived fluxes and/or for precision agriculture practices.

Conclusions
This study has successfully demonstrated the application of an off-the-shelf unmanned aerial system (e.g., Trinity F90+ and Micasense Altum) for reproducing eddy covariance station-derived energy flux estimates using the Two-Source Energy Balance Model. Utilising the long flight time and range of the UAS, we demonstrate the potential for these systems to be used to spatially map fluxes over heterogeneous land use/cover types. Comparisons of UAS-and eddy covariance-(industry standard) derived flux estimates, indicated good agreement between the two methodologies with errors <50 W m −2 for Rn, LE and H. Furthermore, the UAS method provides new and additional spatial information to EC flux estimates. We also demonstrate the importance of proper and appropriate UAS thermal sensor calibration, and its impact on derived fluxes estimates; specifically achieving 19% improvement in accuracies using a simple field-based calibration step.
Areas for future work should include increasing the range and diversity of land cover types; as well as the potential for upscaling this work from EC and UAS to satellite derived flux estimates. In addition, given the low errors in LE estimates, it could be further adapted to high-resolution evapotranspiration mapping, particularly in water-scarce agricultural settings which rely on irrigation.    Here we test two different extraction methods, taking the pixel mean within a 5 m buffer around the EC station (the area closest to the ground heat flux plates), versus using the EC station footprint. Table A2. The standard Micasense conversion factor (Equation (9)) was used to derive radiometric temperature. Here errors are computed by comparing TSEB flux outputs with EC station flux data, whereby the energy balanced is closed using three different methods (see Section 3.2 for a summary).