PhotonLabeler: An Inter-disciplinary Platform for 2 Visual Interpretation and Labeling of ICESat-2 3 Geolocated Photon Data

: NASA’s ICESat-2space-borne photon-counting lidar mission is providing global 8 elevation measurements that will provide significant benefits to a variety of bio-geoscience 9 research applications. Given the novelty of elevation and the derived data products from the 10 ICESat-2 mission, the research community needs software tools that can facilitate photon-level 11 analyses to support product validation and development new analysis methods. Here, we 12 describe PhotonLabeler, a free graphic user interface (GUI) for manual labeling and visualization 13 of ICESat-2 Geolocated Photon data (ATL03). Developed in MATLAB, the GUI facilitates the 14 reading and display of ATL03 Hierarchical Data Format (HDF) files, the manual labeling of 15 individual photons into target classes of choice using a number of point selections tools and 16 enables eventual saving of labeled data in ASCII format. Other capabilities include saving and 17 loading of labeling sessions to manage labeling tasks over time. We expect labeled data generated 18 using the application to serve two main purposes. First, serve as ground truth for validating 19 various products from ICESat-2 mission, especially for study sites around the world that do not 20 have existing reference datasets such as airborne lidar. Second, serve as training and validation 21 data in the development of new algorithms for generating various ICESat-2 data products. We 22 demonstrate the first use case through a validation case study for the land and vegetation product 23 (ATL08), which provides canopy and terrain height estimates, over two sites. For the first site, 24 located in northwestern Zambia, we used ICESat-2 ATL03 data acquired at night and for our 25 second site in Texas, US, we used ATL03 data acquired during the day. The PhotonLabeler 26 application is freely available as a compiled MATLAB binary to enable free access and utilization 27 by interested researchers.


32
NASA'S ICESat-2 (Ice, Cloud, and land Elevation Satellite-2) space-borne lidar mission 33 launched in September 2018. A follow-up spaceborne lidar mission to the first ICESat mission, 34 ICESat-2 carries the Advanced Topographic Laser Altimeter System (ATLAS) lidar instrument, a 35 photon-counting lidar that uses a green (532 nm) laser for range measurement. The ATLAS 36 instrument has notable advantages over its predecessor GLAS sensor including a reduced laser 37 footprint (~14 -17 m diameter reduced from ~70 m diameter) and an increased along-track sampling 38 through higher pulse-repetition rate (10 kHz or about 0.70 m footprint spacing). Other 39 improvements include the increased across-track sampling by partitioning the emitted laser into 40 multiple profiling strong and weak beams [1,2]. Since its launch, ATLAS has provided 41 unprecedented high-resolution three-dimensional along-track measurements of ice sheets, sea ice, measurements from ancillary systems such as GPS each photon is precisely geolocated by 48 determining its time, latitude, longitude, and height to generate the Geolocated Photon Data 49 (ATL03) [7]. Photons reflected from the surface along with those from the atmosphere or solar 50 radiation are detectable which usually results in a combination of signal and noise photons. Thus, 51 ATL03 data tends to exhibit higher levels of noise for daytime acquisition given the higher impact 52 of solar background illumination [6,8]. Notwithstanding the presence of noise in the data, ICESat-53 2 mission, being a space-based instrument, has improved the possibility to characterize the Earth's 54 surface from local to global scales. Utilizing the 532 nm wavelength on the ATLAS instrument has 55 also created opportunities for both surface and bathymetric mapping as the green energy can also 56 penetrate water and interact with the sub-surface in addition to its surface and vegetation 57 interactions [4,9,10]. ICESat-2 ATL03 photon data is enabling the generation of various standard 58 data products for land ice, sea ice, the atmosphere, vegetation and land, oceans and inland water 59 applications, which are available from the National Snow & Ice Data Center (NSIDC) website 60 (https://nsidc.org/data/icesat-2/data-sets).

61
The availability of ICESat-2 products has spurred various studies to validate the products [11], 62 develop alternative algorithms to generate similar datasets [2] or derive other products based on 63 available products [4]. In those studies, the availability of reference data is a critical component to 64 enable comparisons with existing products or assessment of developed algorithms, and airborne 65 lidar data predominantly played this vital role. In Wang et al. [10] airborne lidar data were used to 66 validate ground elevation and vegetation heights from the ATL08 product. Similar assessments 67 have been carried out for land and vegetation height metrics [11]. Some studies also used airborne 68 lidar data to simulate ICESat-2 data prior to launch of the mission to enable development of noise 69 filtering algorithms [2]. However, airborne lidar data are not be available in all areas or might be 70 outdated to allow for an objective evaluation of current ICESat-2 products. In addition, there are 71 assessments that may require a photon level understanding such as noise filtering, photon 72 classification or assessing sources of error that airborne lidar data might not adequately support. 73 In such cases, expert manually labeled photon data developed by direct interpretation of raw 74 ATL03 photon data can facilitate such analyses. Manually labeled data also eliminates positional 75 errors due the reference data, which enhances the overall analysis. Visual interpretation of data is 76 not new in remote sensing and has supported various studies where ground truth data were not 77 available [12][13][14]. Manually labeling photons is analogous to aerial photo interpretation consisting 78 in identification of features in remote sensing images through visual interpretation. Scientific 79 software tools allowing the display and labeling of photon data would enhance the development 80 of labeled datasets to support product validation and development of analysis methods.       Euclidean space, signal points (real data) tend to cluster together compared to noise points which 148 tend to be randomly distributed in space (Figure 2a and b). Sometimes, ATL03 data are corrupted by 149 instrument or geolocation errors, resulting in unnatural discontinuities and offsets in the data ( Figure   150 2c). Such data sections are usually unusable and could also be interpreted as noise.

151
Signal points could further be interpreted into terrain (e.g. land or sea ice terrain), above-terrain 152 point classes (e.g. forest canopies, buildings and clouds) and below-terrain surfaces (e.g. bathymetry) 153 by leveraging attributes specific to the data and using ancillary data. For instance, elevation       We assessed the level of agreement between ATL08 height metrics and corresponding estimates 318 derived from manually labeled data using regression analysis. We took values calculated from 319 manually labeled data to be the reference or observed variables and took ATL08 data as predicted 320 values, using the regression coefficient of determination (R 2 ) as a measure of correlation. We also 321 calculated a mean bias metric, calculated as the mean of differences between reference and predicted 322 values, as a measure of precision and to shade light on under and over-estimation of the ATL08 323 metrics with respect to manually labeled data, which we took as ground truth.

346
The total numbers of ATL08 segments with valid canopy height estimates matched with labeled 347 data from the Zambian and Texas sites were 84 and 90 respectively. The number of segments with 348 valid terrain height estimates were 92 and 88 for the Zambian and Texas site respectively. Table 3   349 and Table 4 summarize the relationships between various height metrics compared with 350 corresponding estimates derived from manually labeled point data. In general, ATL08 height metrics The precision (Table 4)   Lastly, ATL08 terrain height estimates showed high precision with respect to PL metrics for both 376 sites. Figure 7 shows graphical relationships between ATL08 absolute canopy, relative canopy and 377 terrain data with corresponding PL data for the minimum, mean and maximum height metrics.  of Southern Africa, is another example. Insight or cues on developing algorithms could also be 417 generated from visualizing data in different ecoregions.

418
The case studies on the manual validation of ATL08 data provided a glimpse into the accuracy 419 of ATL08 data for day and night acquisition. These results were generally promising with high 420 correlations (R 2 > 0.8) and precision (mean biases < 5) between ATL08 estimates and estimates 421 generated from manually labeled data. However, these observations as the results of this assessment 422 are limited. Further validation assessment incorporating data in various ecoregions, seasons and of 423 different noise levels is still needed to provide a more complete view of the accuracy of the ATL08 424 estimates. We also acknowledged that manually labeled data is not immune to error and 425 inconsistencies may arise between how the ATL08 algorithms define surfaces and how we manually 426 labeled data. Given the high correlations between ATL08 and PL estimates, we think that was not a 427 big issue for this assessment. For future studies intending to do similar assessments on a large scale, 428 we recommended developing labeling protocols to enhance consistency among labeling experts or 429 data in different environments.

430
Additional functionality in the PhotonLabeler application is in the works to enhance productivity 431 and general user-friendliness. Given that some ICESat-2 products such as ATL06 and ATL08 also 432 contain photon level classification data, one capability envisioned is to enable users start labeling